from:"Matt Mahoney"

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney

A common objection to compression as a test for AI is that humans can't do 
compression, so it has nothing to do with AI.  The reason people can't compress 
is that compression requires both AI and deterministic computation.  The human 
brain is not deterministic because it is made of neurons, which are noisy 
analog devices.

In order to compress text well, the compressor must be able to estimate 
probabilities over text strings, i.e. predict text.  If you take a text 
fragment from a book or article and ask someone to guess the next character, 
most people could do so more accurately than any program now in existence.  
This is clearly an AI problem.  It takes intelligence to predict text like 
6+8=__ or roses are ___.  If a data compressor had such knowledge, then it 
could assign the shortest codes to the most likely answers.  Specifically, if 
the next symbol has probability p, it is assigned a code of length log2(1/p) 
bits.

Such knowledge is useless for human compression because the decompressor must 
have the exact same knowledge to generate the same codes.  This requires 
deterministic computation, which is no problem for a machine.  Therefore I 
believe that compression is a valid test for AI, and desirable because it is 
totally objective, unlike the Loebner prize.

Some known problems with the test:

- To pass the Turing test, a machine must have a model of interactive 
conversation.  The Wikipedia training data lacks examples of dialogs.  My 
argument is that noninteractive text is appropriate for tasks such as OCR, 
language translation, broadcast speech recognition, which is more useful than a 
machine that deliberatly makes mistakes and slows its responses to appear 
human.  I think that the problems of learning interactive and noninteractive 
models have a lot of overlap.

- A language model is insufficient for problems with nontextual I/O such as 
vision or robotics that require symbols to be grounded.  True, but a language 
model should be a part of such systems.

- We do not know how much compression is equivalent to AI.  Shannon gave a very 
rough estimate of 0.6 to 1.3 bits per character entropy for written English in 
1950.  There has not been much progress since then in getting better numbers.  
The best compressors are near the high end of this range.  (I did some research 
in 2000 to try to pin down a better number.  
http://cs.fit.edu/~mmahoney/dissertation/entropy1.html )

- It has not been shown that AI can learn from just a lot of text.  I believe 
that lexical, semantic, and syntactic models can be learned from unlabeled 
text.  Children seem to do this.  I doubt that higher level problem solving 
abilities can be learned without some coaching from the programmer, but this is 
allowed.

- The WIkipedia text has a lot of nontext like hypertext links, tables, foreign 
words, XML, etc.  True, but it is 75% text, so a better compressor still needs 
to compress text better.

Most of these issues were brought up in other newsgroups.
http://groups.google.com/group/comp.ai.nat-lang/browse_frm/thread/9411183ccde5f7a1/#
http://groups.google.com/group/comp.compression/browse_frm/thread/3f096aea993273cb/#
http://groups.google.com/group/Hutter-Prize?lnk=li

I also discuss them here.
http://cs.fit.edu/~mmahoney/compression/rationale.html
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Cc: Bruce J. Klein [EMAIL PROTECTED]
Sent: Saturday, August 12, 2006 12:28:30 PM
Subject: [agi] Marcus Hutter's lossless compression of human knowledge prize

Hi,

About the Hutter Prize (see the end of this email for a quote of the
post I'm responding to, which was posted a week or two ago)...

While I have the utmost respect for Marcus Hutter's theoretical work
on AGI, and I do think this prize is an interesting one, I also want
to state that I don't think questing to win the Hutter Prize is a very
effective path to follow if one's goal is to create pragmatic AGI.

Look at it this way: WinZip could compress that dataset further than
*I* could, given a brief period of time to do it.  Yet, who is more
generally intelligent, me or WinZip??  And who understands the data
better?

Or, consider my 9 year old daughter, who is a pretty bright girl but
does not yet know how to write computer programs (she seems to have
picked up some recessive genes for social well-adjustedness, and is
not as geeky has her dad or big brothers...).  Without a bunch of
further education, she might NEVER be able to compress that dataset
further than WinZip (whereas I could do it by writing a better
compression algorithm than WinZip, given a bit of time).  Yet I submit
that she has significantly greater general intelligence than WinZip.

In short: Compression as a measure of intelligence is only valid if
you leave both processing time, and the contextuality of intelligence,
out of the picture

Similarly, my Novamente AI system is not made to perform rapid

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney

First, the compression problem is not in NP. The general problem of encoding strings as the smallest programs to output them is undecidable.Second, given a model, then compression is the same as prediction. A model is a function that maps any string s to an estimated probability p(s). A compressor then maps s to a code of length log(1/p(s)). The decompressor does the inverse mapping. The compressor and decompressor only need to agree on the model p(), and can then use identical algorithms to assign a mapping. (This step is deterministic, so not possible by humans). Modeling is the same as prediction because p(s) = PROD_i p(s_i | s_1 s_2 ... s_i-1)which is the product of conditional probabilities over the next symbol s_i given all of the previous
 symbols s_1 through s_i-1 in s.Third, given a fixed test set, it would be trivial to write a decompressor that memorized it verbatim and compress to 0 bytes if we did not include the size of the decompressor in the contest. Instead you have to start with a small amount of knowledge coded into the decompressor and learn the rest from the data itself. This is a test of language learning ability.It is easy to dismiss compression as unrelated to AGI. How do you test if a machine with only text I/O knows that roses are red? Suppose it sees "red roses", then later "roses are" and predicts "red". An LSA or distant-bigram model will do this.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Russell Wallace
 [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 12, 2006 5:30:21 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeOn 8/12/06, Matt Mahoney [EMAIL PROTECTED] wrote:
In order to compress text well, the compressor must be able to estimate probabilities over text strings, i.e. predict text.
Um no, the compressor doesn't need to predict anything - it has the entire file already at hand.

The _de_compressor would benefit from being able to predict, e.g.
"roses are red" - the third word need not be sent if the decompressor
can be relied on to know what it will be given the first two.

However, this is not permitted by the terms of the prize: the
compressed file cannot depend on a knowledge base at the receiving end;
it must run on a bare PC. Therefore the challenge is a purely
mathematical one (in class NP, given a limit on decompression time),
and not related to AGI.

Even if the terms of the prize did allow a knowledge base at the
receiving end (which would be problematic for a compression benchmark;
it would be very difficult to make the test objective), it still
wouldn't really be related to AGI. A good decompressor would know that
the words "roses are" tend to be followed by the word "red" - but it
would not know that the three words in sequence mean that roses are red.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney

Hutter's only assumption about AIXI is that the environment can be simulated by 
a Turing machine.

With regard to forgetting, I think it plays a minor role in language modeling 
compared to vision and hearing.  To model those, you need to understand what 
the brain filters out.  Lossy compression formats like JPEG and MP3 exploit 
this by discarding what cannot be seen or heard.  However, text doesn't work 
this way.  How much can you discard from a text file before it differs 
noticeably?
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Pei Wang [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, August 12, 2006 8:53:40 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

Matt,

So you mean we should leave forgetting out of the picture, just
because we don't know how to objectively measure it.

Though objectiveness is indeed desired for almost all measurements, it
is not the only requirement for a good measurement of intelligence.
Someone can objectively measure a wrong property of a system.

I haven't been convinced why lossless compression can be taken as an
indicator of intelligence, except that it is objective and easy to
check. You wrote in your website that Hutter [21,22] proved that
finding the optimal behavior of a rational agent is equivalent to
compressing its observations., but his proof is under certain
assumptions about the agent and its environment. Do these assumptions
hold for the human mind or AGI in general?

Pei


On 8/12/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 Forgetting is an important function in human intelligence because the 
 storage capacity of the brain is finite.  This is a form of lossy 
 compression, discarding the least important information.  Unfortunately, 
 lossy compression cannot be evaluated objectively.  We can compare an image 
 compressed with JPEG with an equal sized image compressed by discarding the 
 low order bits of each pixel, and judge the JPEG image to be of higher 
 quality.  JPEG uses a better model of the human visual system by discarding 
 the same information that the human visual perception process does.  It is 
 more intelligent.  Lossy image compression is a valid but subjective 
 evaluation of models of human vision.  There is no objective algorithm to 
 test for image quality.  It has to be done by humans.

 A lossless image compression contest would not measure intelligence because 
 you are modeling the physics of light and matter, not something that comes 
 from humans.  Also, the vast majority of information in a raw image is 
 useless noise, which is not compressible.  A good model of the compressible 
 parts would have only a small effect.  It is better to discard the noise.

 We are a long way from understading vision.  Standing in 1973 measured 
 subjects ability to memorize 10,000 pictures, viewed for 5 seconds each, then 
 2 days later in a recall test showed pictures and asked if they were in the 
 earlier set, which they did correctly much of the time [1].  You could 
 achieve the same result if you compressed each picture to about 30 bits and 
 compared Hamming distances.  This is a long term learning rate of 6 bits per 
 second for images, or 2 x 10^9 bits over a lifetime, assuming we don't forget 
 anything after 2 days.  Likewise, Landauer [2] estimated human long term 
 memory at 10^9 bits based on rates of learning and forgetting.  It is also 
 about how much information you can absorb as speech or writing in a lifetime 
 assuming 150 words per minute at 1 bpc entropy.  It seems that the long term 
 learning rate of the brain is independent of the medium.  This is why I chose 
 1 GB of text for the benchmark.

 Text compression measures intelligence because it models information that 
 comes from the human brain, not an external source.  Also, there is very 
 little noise in text.  If a paragraph can be rephrased in 1000 different ways 
 without changing its meaning, it only adds 10 more bits to code which 
 representation was chosen.  That is why lossless compression makes sense.

 [1] Standing, L. (1973), Learning 10,000 Pictures, Quarterly Journal of 
 Experimental Psychology (25) pp. 207-222.

 [2] Landauer, Tom (1986), How much do people remember?  Some estimates of 
 the quantity of learned information in long term memory, Cognitive Science 
 (10) pp. 477-493.

  -- Matt Mahoney, [EMAIL PROTECTED]

 - Original Message 
 From: Pei Wang [EMAIL PROTECTED]
 To: agi@v2.listbox.com
 Sent: Saturday, August 12, 2006 4:03:55 PM
 Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge 
 prize

 Matt,

 To summarize and generalize data and to use the summary to predict the
 future is no doubt at the core of intelligence. However, I do not call
 this process compressing, because the result is not faultless, that
 is, there is information loss.

 It is not only because the human brains are noisy analog devices,
 but because the future

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-13 Thread Matt Mahoney

I will try to answer several posts here.First, I said that there is no knowledge that you can demonstrate verbally that cannot also be learned verbally. For simple cases, this is easy to show. If you test for knowledge X by asking question Q, expecting answer A, then you can train a machine "the answer to Q is A". I realize for many practical cases that there could be many questions about Q and you can't anticipate them all. In other words, X could be a procedure or algorithm for generating answers from an intractably large set of questions. For example, X could be the rules for addition or playing chess. In this case, you could train the machine by giving it the algorithm in the form of natural language text (here is how you play chess...).Humans possess a lot of
knowledge that cannot be demonstrated verbally. Examples: how to ride a bicycle, how to catch a ball, what a banana tastes like, what my face looks like. The English language is inadequate to convey such knowledge fully, although some partial knowledge transfer is possible (I have brown hair). Now try to think of questions to test for the parts of the knowledge that cannot be conveyed verbally. Sure, you could ask what color my hair is. Try to ask a question about knowledge that cannot be conveyed verbally to the machine at all. If you can't convey this knowledge to the machine, it can't convey it to you.An important question is: how much information does a machine need to pass the Turing test? The machine only needs knowledge that can be verbally tested. Information theory says that this quantity cannot exceed the entropy of the training data plus the algorithmic complexity (length of the program) of the machine
prior to training. From my argument above, all of the training data can be in the form of text. I estimate that the average adult has been exposed to about 1 GB of speech (transcribed) and writing since birth. This is why I chose 1 GB for the large text benchmark. I do not claim that the Wikipedia data is the *right* text to train an AI system, but I think it is the right amount, and I believe that the algorithms we would use on the right training set would be very similar to the ones we would use on this training set.Second, on lossy vs. lossless compression. It would be a good demonstration of AI if we could compress text using lossy techniques and uncompress to different text that had the same meaning. We can already do this at a simple level, e.g. swapping spaces and linefeeds, or substituting synonyms, or swapping the order of articles. We can't yet do this in the more conceptual way that humans could, but I think
that a lossless model could demonstrate this capability. For example, an AI-level language model would recognize the similarity of "I ate a Big Mac" and "I ate at McDonalds" by compressing the concatenated pair of strings to a size only slightly larger than either string compressed by itself. This ability could then be used to generate conceptually similar strings (in O(n) time as I described earlier).Third, on AIXI, this is a mathematically proven result, so there is no need to test it experimentally. The purpose of the Hutter prize is to encourage research in human intelligence with regard to verbally expressable knowledge, not the more general case. The general case is known to be undecidable, or at least intractable in environments controlled by a finite state machine.AIXI requires the assumption that the environment be computable by a Turing machine. I think this is reasonable. People actually do behave like
rational agents. If they didn't, we would not have Occam's razor.Here is an example: you draw 100 marbles from an urn. All of them are red. What do you predict will be the color of the next marble? Answer this way: what is the shortest program you could write that outputs 101 words, where the first 100 are "red"?Fourth, a program that downloads the Wikipedia benchmark violates the rules of the prize. The decompressor must run on a computer without a network connection. Rules are here:http://cs.fit.edu/~mmahoney/compression/textrules.html-- Matt Mahoney, [EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-13 Thread Matt Mahoney

I can't disagree that an AGI with vision and motor control is more useful than 
one without.  Also I agree that humans learn language by integrating a lot of 
nonverbal knowledge, that humans verbally demonstrate some knowledge learned 
nonverbally and nonverbally demonstrate some knowledge learned verbally.  
Humans reason and solve problems using a model of a world that is learned both 
verbally and nonverbally.  It would be difficult to construct such a model 
without nonverbal data.

In the last Loebner contest, one of the judges asked the machines, which is 
bigger, a 747 or my big toe?  None of the machines could answer.  They lacked 
a good model of the real world.  Cyc has a lot of hand-coded common sense 
knowledge and an inference engine, so it could probably answer such a question 
if it was phrased in Cycl.  However Cyc lacks natural language ability and had 
no entry in the Loebner contest.  It lacks a complete language model.

I think to get the training data for a good model of the real world as humans 
experience it, you need to build a humanoid robot so that it can experience all 
of the things that real people experience.  Without this model, you could not 
pass the Turing test.  Not that you couldn't build a world model by other 
means, it would just take enormous effort, like Cyc.

The goal of AI should not be to pass the Turing test.  Who needs a machine that 
makes arithmetic mistakes and slows down its responses?  What we need are 
machines that know enough natural language to do their jobs.  An automated 
travel agent needs to know that a 747 is an airplane, and it needs to 
understand you when you say you want to change your reservation to leave next 
Tuesday.  It does not need to know about toes.

I think the Hutter prize will lead to a better understading of how we learn 
semantics and syntax.  It will lead to language models that enable applications 
and your operating system to have a working natural language interface.  It 
will improve the accuracy of text scanning, handwriting recognition, speech 
recognition, and language translation.  It will lead to better spam detection.  
It will automate a lot of work now done by people on phones.  Language modeling 
is short of AGI but I think it is an important goal.

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, August 13, 2006 3:25:41 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

Matt,

You've stated that any knowledge that can be demonstrated verbally CAN
in principle be taught verbally.   I don't agree that this is
necessarily true for ANY learning system, but that's not the point I
want to argue.

My larger point is that this doesn't imply that this is how humans do
it.  So, if a human has learned a verbal behavior, and has been
exposed to 1GB of text, it does not imply that said human has learned
said behavior from said text.

In fact there is much evidence that this is NOT the case -- this is
what the whole literature on symbol grounding is about.  Humans
happen to learn a lot of their verbal behaviors based on non-verbal
stimuli and actions.  But, this is not to say that some other AI
system couldn't learn to IMITATE human verbal behaviors based only on
studying human behaviors, of course.

IMO, focusing AI narrowly on text processing is a bad direction for
near-term AGI research.  I think that focusing on symbol grounding and
perception/action/cognition integration is a better approach.  But
this better approach is not likely, in the immediate term, to be the
best approach to excelling at the Hutter Prize task.  Which gets back
to my point that seeking to win the Hutter Prize is probably not a
good guide for near-term AGI development.

-- Ben G


On 8/13/06, Matt Mahoney [EMAIL PROTECTED] wrote:

 I will try to answer several posts here.

 First, I said that there is no knowledge that you can demonstrate verbally
 that cannot also be learned verbally.  For simple cases, this is easy to
 show.  If you test for knowledge X by asking question Q, expecting answer A,
 then you can train a machine the answer to Q is A.  I realize for many
 practical cases that there could be many questions about Q and you can't
 anticipate them all.  In other words, X could be a procedure or algorithm
 for generating answers from an intractably large set of questions.  For
 example, X could be the rules for addition or playing chess.  In this case,
 you could train the machine by giving it the algorithm in the form of
 natural language text (here is how you play chess...).

 Humans possess a lot of knowledge that cannot be demonstrated verbally.
 Examples: how to ride a bicycle, how to catch a ball, what a banana tastes
 like, what my face looks like.  The English language is inadequate to convey
 such knowledge fully, although some partial knowledge transfer is possible
 (I have brown hair).  Now try to think

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-13 Thread Matt Mahoney

Semantic learning from unlabeled text has already been demonstrated and used to 
improve both text compression (perplexity) and word error rates for speech 
recognition [1], and pass the word analogy section of the SAT exams [2].  
Semantic models exploit the fact that related words like moon and star tend 
to appear near each other, forming a fuzzy identity relation.

Syntactic learning is possible from unlabeled text because words with the same 
grammatical role tend to appear in the same immediate context.  For example, 
the X is tells you that X is a noun, allowing you to predict sequences like 
a X was.

[1] Bellegarda, Jerome R., “Speech recognition experiments using multi-span 
statistical language models”, IEEE Intl. Conf. on Acoustics, Speech, and Signal 
Processing, 717-720, 1999.

[2] Turney, Peter D., Measuring Semantic Similarity by Latent Relational 
Analysis. In Proceedings Nineteenth International Joint Conference on 
Artificial Intelligence (IJCAI-05), 1136-1141, Edinburgh, Scotland, 2005.   
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Mark Waser [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, August 13, 2006 5:25:19 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

 I think the Hutter prize will lead to a better understading of how we 
 learn semantics and syntax.

I have to disagree strongly.  As long as you a requiring recreation at the 
bit level as opposed to the semantic or logical level, you aren't going to 
learn much at all about semantics or syntax (other than, possibly, relative 
frequency of various constructs which you can then use to *slightly* better 
optimize -- maybe well enough to win some money but not well enough to win 
enough to make it worthwhile since it is a definite sidetrack from AGI).

Mark




---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney

I've read Chaniak's book, Statistical Language Learning.  A lot of researchers 
in language modeling are using perplexity (compression ratio) to compare 
models.  But there are some problems with the way this is done.

1. Many evaluations are done on corpora from the LDC which are not free, like 
TREC, WSJ, Brown, etc.
2. Many evaluations use offline models.  They train on a portion of the data 
set and evaluate on the rest, or use leave-one-out, or maybe divide into 3 
parts including a validation set.  This makes it difficult to compare work by 
different researchers because there is no consistency in the details of these 
experiments.
3. The input is usually preprocessed in various ways.  Normally, case is 
folded, the words are converted to tokens from a fixed vocabulary and 
punctuation is removed.  Again there is no consistency in the details, like the 
size of the vocabulary, whether to include numbers, etc.  Also this filtering 
removes useful information, so it is difficult to evaulate the true perplexity 
of the model.

I think a good language model will need to combine many techniques in lexical 
modeling (vocabulary acquistion, stemming, recognizing multiword phrases and 
compound words, dealing with rare words, misspelled words, capitalization, 
punctuation and various nontext forms of junk), semantics (distant bigrams, 
LSA), and syntax (statistical parsers, hidden Markov models) in a uniform 
framework.  Most work is usually in the form of a word trigram model plus one 
other technique on cleaned up text.  Nobody has put all this stuff together.  
As a result, the best compresors still use byte-level ngram statistics and at 
most some crude lexical parsing.  I think we can do better.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: J. Storrs Hall, PhD. [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, August 15, 2006 9:37:32 AM
Subject: Re: Sampo: [agi] Marcus Hutter's lossless compression of human 
knowledge prize

On Tuesday 15 August 2006 00:55, Matt Mahoney wrote:
...
 To improve compression further, you will need to model semantics and/or
 syntax.  No compressor currently does this. 

Has anyone looked at the statistical parsers? There is a big subfield of 
computational linguistics doing exactly this, cf e.g. Charniak (down the page 
to statistical parsing) http://www.cs.brown.edu/%7Eec/

I would speculate, btw, that the decompressor should be a virtual machine for 
some powerful macro-expander (which are equivalent to the lambda calculus, 
ergo Turing machines) and the probabilistic regularities in the source be 
reflected in the encoding -- which would be implemented by the executable 
compressed file.

Josh

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney

I realize it is tempting to use lossy text compression as a test for AI because that is what the human brain does when we read text and recall it in paraphrased fashion. We remember the ideas and discard details about the _expression_ of those ideas. A lossy text compressor that did the same thing would certainly demonstrate AI.But there are two problems with using lossy compression as a test of AI:1. The test is subjective.2. Lossy compression does not imply AI.Lets assume we solve the subjectivity problem by having human judges evaluate whether the decompressed output is "close enough" to the input. We already do this with lossy image, audio and video compression (without much consensus).The second problem remains: ideal lossy compression does not imply passing
the Turing test. For lossless compression, it can be proven that it does. Let p(s) be the (unknown) probability that s will be the prefix of a text dialog. Then a machine that can compute p(s) exactly is able to generate response A to question Q with the distribution p(QA)/p(Q) which is indistinguishable from human. The same model minimizes the compressed size, E[log 1/p(s)].This proof does not hold for lossy compression because different lossless models map to identical lossy models. The desired property of a lossless compressor C is that if and only if s1 and s2 have the same meaning (to most people), then the encodings C(s1) = C(s2). This code will ideally have length log 1/(p(s1)+p(s2)). But this does not imply that the decompressor knows p(s1) or p(s2). Thus, the decompressor may decompress to s1 or s2 or choose randomly between them. In general, the output distribution will be different than the true
distrubution p(s1), p(s2), so it will be distinguishable from human even if the compression ratio is ideal.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 9:28:26 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

I
don't see any point in this debate over lossless vs. lossy
compression

Lets see if I can simplify it.

The stated goal is compressing human
knowledge.
The exact, same knowledge can always be expressed
in a *VERY*large number of different bit strings
Not being able to reproduce the exact bit string
is lossy compression when viewed from the bitviewpoint but can be
lossless from the knowledge viewpoint
Therefore, reproducing the bit string isan
additional requirement above and beyond the stated goal
I strongly believe that this additional
requirement will necessitate a *VERY* large amount of additional work not
necessary for the stated goal
In addition, by information theory, reproducing
the exact bit string will requireadditional information beyond the
knowledge contained in it (since numerous different strings can encode the
same knowledge)
Assuming optimalcompression, also by by
information theory, additional information will add to the compressed size
(i.e. lead to a less optimal result).
So the question is "Given thatbit-level
reproduction is harder, not necessary for knowledge compression/intelligence,
and doesn't allow for the same degree of compression. Why makelife
tougher when it isn't necessary for your stated purposes and makes your results
(i.e. compression) worse?"

- Original Message -----
From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Tuesday, August 15, 2006 12:55
AM
Subject: Re: Sampo: [agi] Marcus Hutter's
lossless compression of human knowledge prize

Where
will the knowledge to compress text come from? There are 3
possibilities.1. externally supplied, like the lexical models
(dictionaries) for paq8h and WinRK.2. learned from the input in a separate
pass, like xml-wrt|ppmonstr.3. learned online in one pass, like paq8f and
slim.These all have the same effect on compressed size. In the
first case, you increase the size of the decompressor. In the second,
you have to append the model you learned from the first pass to the compressed
file so it is available to the decompressor. In the third case,
compression is poor at the beginning. From the viewpoint of information
theory, there is no difference in these three approaches. The penalty is
the same.To improve compression further, you will need to model
semantics and/or syntax. No compressor currently does this. I
think the reason is that it is not worthwhile unless you have hundreds of
megabytes of natural language text. In fact, only the top few
compressors even have lexical models. All the rest are byte oriented
n-gram models.A semantic model would know what words are related, like
"star" and "moon". It would learn this by their tendency to appear
together. You can build a dictionary of such knowledge from the data set
i

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney

You could use Keogh's compression dissimilarity measure to test for inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means the compressed size of x (lossless). The measure ranges from about 0.5 if x = y to about 1.0 if x and y do not share any information. Then, CDM("it is hot", "it is very warm") CDM("it is hot", "it is cold").assuming your compressor uses a good language model.Now if only we had some test to tell which compressors have the best language models...-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 3:22:10 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

Could
you please write a test program to objectively test for lossy text compression
using your algorithm?

Writingthe test program for the decompressing
programis relatively easy. Since the requirement was that the
decompressing program be able to recognize when a piece of knowledge is in the
corpus, when it's negation is in the corpus, when an incorrect substitution has
been made, and when a correct substitution has been made -- all you/I would need
to do isinvent (orobtain -- see two paragraphs down)a
reasonably sized set of knowledge pieces to test, put them in a file, feed them
to the decompressing program, and automatically grade it's answers as to which
categoryeachfalls into. A reasonably small number of test
cases should suffice as long as you don't advertise exactly which test cases are
in the final test but once you're having competitors generate each other's
tests, you can go hog-wild with the number.

Writing the test program for the compressing
program is also easy but developing the master list of inconsistencies is going
to be a real difficulty -- unless you use the various contenders themselves to
generate various versions of the list. I strongly doubt that most
contenders will get false positives but strongly suspect that finding all of the
inconsistencies will be a major area for improvement as the systems become more
sophisticated.

Note also that minor modifications ofany
decompressing program should also be able to create test cases for your
decompressor test. Simply ask it for a random sampling of knowledge, for
the negations of a random sampling of knowledge, for some incorrect
substitutions, and some hierarchical substitutions of each type.

Any *real* contenders should be able to easily
generate the tests for you.

You
can start by listing all of the inconsistencies in
Wikipedia.

see paragraph 2 above

To
make the test objective, you will either need a function to test whether two
strings are inconsistent or not, or else you need to show that people will never
disagree on this matter.
It is impossible to show that people will never
disagree on a matter.

On the other hand, a knowledge compressor is going
to have to recognize when two pieces of knowledge conflict (i.e. when two
strings parse into knowledge statements that cannot coexist). You can
always have a contender evaluate whether a competitor's
"inconsistencies"are incorrect and then do some examination by hand on a
representative sample where the contender says it can't tell (since,
again,I suspect you'll find few misidentified inconsistencies -- but that
finding all of the inconsistencies will be ever subject to
improvement).

Lossy compression does not imply
AI.
A lossy text compressor that did
the same thing (recall it in paraphrased fashion)would certainly
demonstrate AI. I disagree
that these are inconsistent. Demonstrating and implying are different
things.

I didn't say that they were inconsistent. What I meant to say was

that a decompressing programthat isable tooutput all
of the compressed file's knowledge in ordinary English would, in your
words,"certainly demonstrate AI".
given statement 1, it's not a problem that "lossy compression does not
imply AI" since the decompressing program would still "certainly demonstrate
AI"

- Original Message -

From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Tuesday, August 15, 2006 2:23
PM
Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize

Mark,Could
you please write a test program to objectively test for lossy text compression
using your algorithm? You can start by listing all of the
inconsistencies in Wikipedia. To make the test objective, you will
either need a function to test whether two strings are inconsistent or not, or
else you need to show that people will never disagree on this matter.
Lossy compression does not imply
AI. A lossy text compressor that
did the same thing (recall it in paraphrased fashion)would cert

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney

Mark wrote:Huh? By definition, the compressor with the best
language model is the one with the highest compression ratio.
I'm glad we finally agree :-) You
could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following strings:
"I only used red and yellow paint in the painting", "I painted the rose in my
favorite color", "My favorite color is pink", "Orange is created by mixing red
and yellow", "Pink is created by mixing red and white". How is Keogh's
measure going to help you with that?
You group the strings into a fixed set and a variable set and concatenate them. The variable set could be just "I only used red and yellow paint in the painting", and you compare the CDM replacing "yellow" with "white". Of course your compressor must be capable of abstract reasoning and have a world model.To answer Phil's post: Text compression is only near the theoretical limts for small files. For large files, there is progress to be made integrating known syntactic and semantic modeling techniques into general purpose compressors. The theoretical limit is about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The proof that I gave that a language model implies passing the Turing test is for the ideal case where all people share identical models. The ideal case is
deterministic. For the real case where models differ, passing the test is easier because a judge will attribute some machine errors to normal human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text compression is equivalent to AI).It is really hard to get funding for text compression research (or AI). I had to change my dissertation topic to network security in 1999 because my advisor had funding for that. As a postdoc I applied for a $50K NSF grant for a text compression contest. It was rejected, so I started one without funding (which we now have). The problem is that many people do not believe that text compression is related to AI (even though speech recognition researchers have been evaluating models by perplexity since the early 1990's).-- Matt Mahoney,
[EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

You
could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following strings:
"I only used red and yellow paint in the painting", "I painted the rose in my
favorite color", "My favorite color is pink", "Orange is created by mixing red
and yellow", "Pink is created by mixing red and white". How is Keogh's
measure going to help you with that?

The problem is that Keogh's measure is intended for
data-mining where you have separate instances, not one big entwined Gordian
knot.

Now if
only we had some test to tell which compressors have the best language
models...
Huh? By definition, the compressor with the best
language model is the one with the highest compression ratio.

- Original Message -
From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Tuesday, August 15, 2006 3:54
PM
Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize

You
could use Keogh's compression dissimilarity measure to test for
inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf
CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means
the compressed size of x (lossless). The measure ranges from about 0.5
if x = y to about 1.0 if x and y do not share any information.
Then, CDM("it is hot", "it is very warm") CDM("it is hot",
"it is cold").assuming your compressor uses a good language
model.Now if only we had some test to tell which compressors have the
best language models...
-- Matt Mahoney, [EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-16 Thread Matt Mahoney

If dumb models kill smart ones in text compression, then how do you know they are dumb? What is your objective test of "smart"? The fact is that in speech recognition research, language models with a lower perplexity also have lower word error rates.We have "smart" statistical parsers that are 60% accurate when trained and tested on manually labeled text. So why haven't we solved the AI problem? Meanwhile, a "dumb" model like matching query words to document words enables Google to answer natural language queries, while our smart parsers choke when you misspell a word. Who is smart and who is dumb? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, August 16, 2006 9:17:52 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

You
group the strings into a fixed set and a variable set and concatenate
them. The variable set could be just "I only used red and yellow paint in
the painting", and you compare the CDM replacing "yellow" with "white".
Of course your compressor must be capable of abstract reasoning and have a world
model.

Very nice
example of
"homonculous"/"turtles-all-the-way-down"reasoning.

The
problem is that many people do not believe that text compression is related to
AI (even though speech recognition researchers have been evaluating models by
perplexity since the early 1990's).
I believe that
it's related to AI . . . . but that the dumbest models kill intelligent models
every time . . . .which then makes AI useless for text
compression

And bit-level text storage and reproduction is unnecessary for
AI (and adds a lot of needless complexity) . . . .

So why are combining the two?

- Original Message -
From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Tuesday, August 15, 2006 6:02
PM
Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize

Mark
wrote:
Huh? By definition, the compressor with the
best language model is the one with the highest compression
ratio.I'm glad we
finally agree :-)
You
could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following
strings: "I only used red and yellow paint in the painting", "I painted the
rose in my favorite color", "My favorite color is pink", "Orange is created by
mixing red and yellow", "Pink is created by mixing red and white". How
is Keogh's measure going to help you with that?You group the
strings into a fixed set and a variable set and concatenate them. The
variable set could be just "I only used red and yellow paint in the painting",
and you compare the CDM replacing "yellow" with "white". Of course your
compressor must be capable of abstract reasoning and have a world
model.To answer Phil's post: Text compression is only
near the theoretical limts for small files. For large files, there is
progress to be made integrating known syntactic and semantic modeling
techniques into general purpose compressors. The theoretical limit is
about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The
proof that I gave that a language model implies passing the Turing test is for
the ideal case where all people share identical models. The ideal case
is deterministic. For the real case where models differ, passing the
test is easier because a judge will attribute some machine errors to normal
human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text
compression is equivalent to AI).It is really hard to get
funding for text compression research (or AI). I had to change my
dissertation topic to network security in 1999 because my advisor had funding
for that. As a postdoc I applied for a $50K NSF grant for a text
compression contest. It was rejected, so I started one without funding
(which we now have). The problem is that many people do not believe that
text compression is related to AI (even though speech recognition researchers
have been evaluating models by perplexity since the early 1990's).
-- Matt Mahoney, [EMAIL PROTECTED]

-
Original Message From: Mark Waser [EMAIL PROTECTED]To:
agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject:
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human
knowledge prize

Re: [agi] Lossy ** lossless compression

2006-08-20 Thread Matt Mahoney

The argument for lossy vs. lossless compression as a test for AI seems to be motivated by the fact that humans use lossy compression to store memory, and cannot do lossless compression at all. The reason is that lossless compression requires the ability to do deterministic computation. Lossy compression does not. So this distinction is not important for machines.The proof that an ideal language model implies passing the Turing test requires a lossless model. A lossy model has only partial knowledge of the distribution of strings in natural language dialogs. Without full knowledge, it is not possible to duplicate the same distribution of equivalent representations of the same idea, allowing such expressions to be recognized as not human, even if the compression is ideal.
For example, a lossy compressor might compress all of the following to the same code: "it is hot", "it is quite warm", "it is 107 degrees", "the burning desert sun seared my skin", etc. This distribution of expressions of equivalent ideas (or almost equivalent) is not uniform. Humans recognize that some expressions are more common than others, but an ideal lossy compressor is unable to regenerate the same distribution. (If it could, it would be a lossless model). It only needs to know the sum of the probabilities for ideal compression.This example brings up another issue. Who is to say if two expressions represent the same idea? The problem itself requires AI.The proper way to avoid coding equivalent representations in an objective way is to remove all noise (e.g. misspelled words, grammatical errors, arbitrary line breaks), from the data set and put it in a canonical form, so there can only be one way to represent the
ideas within. This would remove any distinction between lossy and lossless compression. However it would be a gargantuan task. It would take a lifetime to read 1 GB of text. But by using Wikipedia, most of this work has already been done. There are very few spelling or grammar errors due to extensive review, and there is a rather uniform style. Line breaks only occur on paragraph boundaries.Uncompressed video would be the absolutely worst type of test data. Uncompressed video is about 10^8 to 10^9 bits per second. The human brain has a long term learning rate of around 10 bits per second. So all the rest is noise. How are you going to remove that prior to compression?There are no objective functions to compare the quality of lossy decompression. For images, we have PSNR, which is the RMS error of the pixel differences between the original and reconstructed images. But this is a poor
measure. For example, if I increased the brightness of all pixels by 1%, you would not see any difference. However if I increased the brightness of just the top half of the image by 1%, then the PSNR would be reduced by 50% but there would be an obvious horizontal line across the image. Any test of lossy quality has to be subjective.This is not to say that investigating how humans do lossy compression isn't an important field of study. I think it is essential to understanding how vision, hearing, and the other senses work and how that data is processed. We currently do not have good models to describe how human decide what to remember and what to discard.But the Hutter prize is to motivate better language models, not vision or hearing or robotics. For that task, I think lossless text compression is the right approach.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: boris [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 19, 2006 10:25:58 PMSubject: [agi] Lossy ** lossless compression

It's been said that we have to go after lossless compression because there's
no way to objectively measure the quality of lossy compression. That makes sense
only in the context of dumb indiscriminate transforms conventionally used for
compression.
If compression is produced by pattern recognition we can quantify lossless
compression of individual patterns, which is a perfectly objective criterion for
selectively *losing* insufficiently compressed patterns. To make Hutter's prize
meaningful it must be awarded for compression of the *best* patterns, rather
than of the whole data set. And, of course, linguistic/semantic data is a lousy
place to start, it's already been heavily compressed by "algorithms" unknown to
any autonomous system. An uncompressed movie would be a far, far better data
sample. Also, the real criterion of intelligence is prediction, which is a
*projected* compression of future data. The difference is that current
compression is time-symmetrical, while prediction obviously
isn't.
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compression

2006-08-22 Thread Matt Mahoney

Humans can do lossless compression but do it badly.  Since human memory is 
inherently lossy, we must add error correcting information, which increases 
mental effort (storage cost, and thus learning time).  Recalling text verbatim 
is harder than paraphrasing.  It requires the mental equivalent of storing 
several identical copies.

Humans can also execute arbitrary algorithms, but not efficiently.  So it is 
possible to do things like send Morse code, which compresses text by using 
shorter codes for the most common letters.  But this is not making use of our 
built in language model.  The sender and receiver have to agree on a learned, 
predefined code (although the code is based on a crude model).  Learning such 
codes requires extra effort (storage) so that the signal can be decoded without 
errors.  Ironically, the receiver still uses his language model for error 
correction outside the scope of the Morse code decompression algorithm.  If a 
signal is ambiguous as to whether a beep is a dot or a dash, the receiver can 
usually guess correctly by considering context.  Machines can't do this.  
Decoding telegraph signals sent by humans is a hard problem for machines.

Now, one may interpret this as an argument that lossless compression is 
unrelated to AI and we should use lossy compression as a test instead.  No, I 
am not arguing that.  Humans make very good use of their imprecise language 
models for text prediction and error correction.  Those are the qualities that 
we want to emulate in AI.  A machine can make a model precise at no extra cost, 
enabling us to use text compression to objectively measure these qualities.  
Researchers in speech recognition have been using this approach for the last 15 
years.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: J. Andrew Rogers [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, August 22, 2006 12:58:04 AM
Subject: Re: [agi] Lossy ** lossless compression


On Aug 20, 2006, at 11:15 AM, Matt Mahoney wrote:
 The argument for lossy vs. lossless compression as a test for AI  
 seems to be motivated by the fact that humans use lossy compression  
 to store memory, and cannot do lossless compression at all.  The  
 reason is that lossless compression requires the ability to do  
 deterministic computation.  Lossy compression does not.


I think this needs to be qualified a bit more strictly in real (read:  
finite) cases.  There is no evidence that humans are incapable of  
lossless compression, only that lossless compression is far from  
efficient and humans have resource bounds that generally encourage  
efficiency.  A distinction with a difference.  Being able to recite a  
text verbatim is a different process than reciting a summary of its  
semantic content, and humans can do both.

Even a probabilistic (e.g. Bayesian) computational model can  
reinforce some patterns to the point where all references to that  
pattern will be perfect in all contexts over some finite interval.  I  
expect it would be trivial to prove a decent probabilistic model has  
just such a property over any arbitrary finite interval for any given  
pattern with proper reinforcement.


I do not disagree that measures of lossy models is a significant  
practical issue for the purposes of a contest.  But on the other  
hand, lossless models demand certain levels of inefficiency that a  
useful intelligent system would not exhibit and which impacts the  
solution space by how poorly these types of algorithms scale  
generally.  If we knew an excellent lossless algorithm could fit  
within the resource constraints common today such that a lossy  
algorithm was irrelevant to the contest, I would expect a contest  
would be unnecessary.   Which is not to say that I think the rules  
should be changed, just that this is quite relevant to the bigger  
question.

Cheers,

J. Andrew Rogers

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compressio

2006-08-25 Thread Matt Mahoney

As I stated earlier, the fact that there is normal variation in human language 
models makes it easier for a machine to pass the Turing test.  However, a 
machine with a lossless model will still outperform one with a lossy model 
because the lossless model has more knowledge.

I agree it is important to understand how the human brain filters information 
(lossy compression), especially vision and hearing.  This does not change the 
fact that lossless compression is the right way to evaluate a language model.  
A lossy model cannot be evaluated objectively.  I guess we will have to agree 
to disagree.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Philip Goetz [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, August 25, 2006 12:31:06 PM
Subject: Re: [agi] Lossy ** lossless compressio

On 8/20/06, Matt Mahoney [EMAIL PROTECTED] wrote:

 The argument for lossy vs. lossless compression as a test for AI seems to be
 motivated by the fact that humans use lossy compression to store memory, and
 cannot do lossless compression at all.  The reason is that lossless
 compression requires the ability to do deterministic computation.  Lossy
 compression does not.  So this distinction is not important for machines.

No; the main argument is that lossy compression allows the use of
much, much more sophisticated, and much, much more powerful
compression algorithms, achieving much higher compression ratios.
Also, lossless compression is already nearly as good as it can be.
Statistical methods will probably out-perform intelligent methods on
lossless compression, especially if the size of the compressor is
included.

 The proof that an ideal language model implies passing the Turing test
 requires a lossless model.  A lossy model has only partial knowledge of the
 distribution of strings in natural language dialogs.  Without full
 knowledge, it is not possible to duplicate the same distribution of
 equivalent representations of the same idea, allowing such expressions to be
 recognized as not human, even if the compression is ideal.

By this argument, no human can pass the Turing test, since none of us
have the same distributions, either.  Or perhaps just one human can
pass it.  Presumably Turing.

You will never, never, never, never recreate the same exact language
model in a computer as resides in any particular human.  Losslessness
is relevant only when you need to recreate it exactly, and you can't,
so it's irrelevant.

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compression

2006-08-25 Thread Matt Mahoney

- Original Message 
From: Mark Waser [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, August 25, 2006 5:58:02 PM
Subject: Re: [agi] Lossy ** lossless compression

 However, a machine with a lossless model will still outperform one with a 
 lossy model because the lossless model has more knowledge.

PKZip has a lossless model.  Are you claiming that it has more knowledge? 
More data/information *might* be arguable but certainly not knowledge -- and 
PKZip certainly can't use any knowledge that you claim that it has.

DEL has a lossy model, and nothing compresses smaller.  Is it smarter than 
PKZip?

 Let me state one more time why a lossless model has more knowledge.  If x and 
x' have the same meaning to a lossy compressor (they compress to identical 
codes), then the lossy model only knows p(x)+p(x').  A lossless model also 
knows p(x) and p(x').  You can argue that if x and x' are not distinguishable 
then this extra knowledge is not important.  But all text strings are 
distinguishable to humans.

But let me give an example of what we have already learned from lossless 
compression tests.

1. PKZip, bzip2, ppmd, etc. model text at the character (ngram) level.
2. WinRK and paq8h model text at the lexical level using static dictionaries.  
They compress better than (1).
3. xml-wrt|ppmonstr and paq8hp1 model text at the lexical level using 
dictionaries learned from the input.  They compress better than (2).

I think you can see the pattern.

There has been research in semantic models using distant bigrams and LSA.  
These compress cleaned text (restricted vocabulary, no punctuation) better than 
models without these capabilities, as measured by word perplexity.  Currently 
there are no general purpose compressors that model syntax or semantics, 
probably because such models are only useful on large text corpora, not the 
kind of files people normally compress.  I think that will change if there is a 
financial incentive.

 This does not change the fact that lossless compression is the right way 
 to evaluate a language model.

. . . . in *your* opinion.  I might argue that it is the *easiest* way to 
evaluate a language model but certainly NOT the best -- and I would then 
argue, therefore, not the right way either.

Also in the opinion of speech recognition researchers studying language models 
since the early 1990's.

 A lossy model cannot be evaluated objectively

Bullsh*t.  I've given you several examples of how.  You've discarded them 
because you felt that they were too difficult and/or you didn't understand 
them.

Deciding if a lossy decompression is close enough is an AI problem, or it 
requires subjective judging by humans.  Look at benchmarks for video or audio 
codecs.  Which sounds better, AAC or Ogg?

-- Matt Mahoney, [EMAIL PROTECTED]

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compression

2006-08-26 Thread Matt Mahoney

First let me respond to Boris and Mark.  I agree.  Mark suggested putting 
Wikipedia in a canonical form, which would remove the distinction between 
lossless and lossy compression.  This will be hard, but Boris made an important 
observation that useful data is generally compressable and useless data (noise) 
is not.  I don't think the problem can be solved completely but there is 
clearly room for improvement.

Eliezer suggests putting a model of the universe on a USB drive and then 
running the model to predict how many fingers he is holding up.  Let's assume 
that is possible.  Stephen Wolfram suggests the model, if one exists, might 
only be a few lines of code.
http://en.wikipedia.org/wiki/A_New_Kind_of_Science

But we must solve a few other problems first.

1. It may be hard to find such a model.  We cannot tell whether the apparent 
randomness of quantum mechanics is truly random or generated by a 
deterministic, but random appearing process.  This happens in cryptography.  
The only way to distinguis between true random data and an encrypted block of 
zero bits is to break the decryption.  The former is not compressable, the 
latter is.

2. Assuming we solve this mystery of the universe and it turns out to be 
deterministic, we still have the problem of running the code on a computer that 
resides within the universe.  If the universe is infinite, then it is possible 
because one Turing machine can simulate another.  If the universe is finite (as 
quantum theory and the Big Bang suggest, also the lack of real Turing 
machines), then it is not possible because a state machine cannot simulate 
itself.  Having the USB drive simulate all of the universe except itself would 
resolve this problem, but then if the USB drive resides outside the universe, 
how do we read the result?

3. Assuming we overcome this obstacle, it may be that the program will say how 
many fingers, but in that case the program also completely determines my 
behavior and might not allow me to answer.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Eliezer S. Yudkowsky [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, August 25, 2006 8:08:02 PM
Subject: Re: [agi] Lossy ** lossless compression

Matt Mahoney wrote:
 
 DEL has a lossy model, and nothing compresses smaller.  Is it smarter
 than PKZip?
 
 Let me state one more time why a lossless model has more knowledge.
 If x and x' have the same meaning to a lossy compressor (they
 compress to identical codes), then the lossy model only knows
 p(x)+p(x').  A lossless model also knows p(x) and p(x').  You can
 argue that if x and x' are not distinguishable then this extra
 knowledge is not important.  But all text strings are distinguishable
 to humans.

Suppose I give you a USB drive that contains a lossless model of the 
entire universe excluding the USB drive - a bitwise copy of all quark 
positions and field strengths.

(Because deep in your heart, you know that underneath the atoms, 
underneath the quarks, at the uttermost bottom of reality, are tiny 
little XML files...)

Let's say that you've got the entire database, and a Python interpreter 
that can process it at any finite speed you care to specify.

Now write a program that looks at those endless fields of numbers, and 
says how many fingers I'm holding up behind my back.

Looks like you'll have to compress that data first.

-- 
Eliezer S. Yudkowsky  http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compression

2006-08-26 Thread Matt Mahoney

I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 4:51:07 PMSubject: Re: [agi] Lossy ** lossless compression

 
 

 Mark suggested putting Wikipedia in a 
canonical form, which would remove the distinction between lossless and lossy 
compression.

Hmmm. Interesting . . . . Actually, I 
didn't suggest exactly that -- though I can see how 
you got that impression. I suggested that the decompression program should 
output the Wikipedia in canonical form meaning that it would be lossy as far as 
information is concerned (i.e. it loses the exact bit sequence of the input) but 
it would be lossless as far as knowledge is concerned. Putting the 
Wikipedia in a canonical form (or -- developing a good canonical form to put the 
Wikipedia into) strikes me as the largest part of the challenge (and thus, not 
something that you want to -- or should -- take on as contest 
organizers).

- Original Message - 
From: "Matt Mahoney" [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, August 26, 2006 3:29 
PM
Subject: Re: [agi] Lossy ** lossless 
compression
 First let me respond to Boris and Mark. I agree. Mark 
suggested putting Wikipedia in a canonical form, which would remove the 
distinction between lossless and lossy compression. This will be hard, but 
Boris made an important observation that useful data is generally compressable 
and useless data (noise) is not. I don't think the problem can be solved 
completely but there is clearly room for improvement.  Eliezer 
suggests putting a model of the universe on a USB drive and then running the 
model to predict how many fingers he is holding up. Let's assume that is 
possible. Stephen Wolfram suggests the model, if one exists, might only be 
a few lines of code. http://en.wikipedia.org/wiki/A_New_Kind_of_Science  But we must solve a few other problems 
first.  1. It may be hard to find such a model. We cannot 
tell whether the apparent randomness of quantum mechanics is truly random or 
generated by a deterministic, but random appearing process. This happens 
in cryptography. The only way to distinguis between true random data and 
an encrypted block of zero bits is to break the decryption. The former is 
not compressable, the latter is.  2. Assuming we solve this 
mystery of the universe and it turns out to be deterministic, we still have the 
problem of running the code on a computer that resides within the 
universe. If the universe is infinite, then it is possible because one 
Turing machine can simulate another. If the universe is finite (as quantum 
theory and the Big Bang suggest, also the lack of real Turing machines), then it 
is not possible because a state machine cannot simulate itself. Having the 
USB drive simulate all of the universe except itself would resolve this problem, 
but then if the USB drive resides outside the universe, how do we read the 
result?  3. Assuming we overcome this obstacle, it may be that 
the program will say how many fingers, but in that case the program also 
completely determines my behavior and might not allow me to 
answer. -- Matt Mahoney, [EMAIL PROTECTED]  
- Original Message  From: Eliezer S. Yudkowsky [EMAIL PROTECTED] To: 
agi@v2.listbox.com Sent: 
Friday, August 25, 2006 8:08:02 PM Subject: Re: [agi] Lossy ** 
lossless compression  Matt Mahoney wrote: 
 DEL has a lossy model, and nothing compresses smaller. Is it 
smarter than PKZip?  Let me state one more 
time why a lossless model has more knowledge. If x and x' have the 
same meaning to a lossy compressor (they compress to identical 
codes), then the lossy model only knows p(x)+p(x'). A lossless 
model also knows p(x) and p(x'). You can argue that if x and 
x' are not distinguishable then this extra knowledge is not 
important. But all text strings are distinguishable to 
humans.  Suppose I give you a USB drive that contains a lossless 
model of the  entire universe excluding the USB drive - a bitwise copy 
of all quark  positions and field strengths.  (Because 
deep in your heart, you know that underneath the atoms,  underneath the 
quarks, at the uttermost bottom of reality, are tiny  little XML 
files...)  Let's say that you've got the entire database, and a 
Python interpreter  that can process it at any finite speed you care to 
specify.  Now write a program that looks at those endless fields 
of numbers, and  says how many fingers I'm holding up behind my 
back.  Looks like you'll have to compress that data 
first.  --  Eliezer S. 
Yudkowsky 
http://singinst.org/ Research 
Fellow, Singularity Institute for Artificial Intelligence  
--- To unsubscribe, change your address, or temporarily deactivate 
your subscription,  please go to http://v2.listbox.

Re: [agi] Lossy ** lossless compression

2006-08-26 Thread Matt Mahoney

Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. The procedure and a program for generating it is described at the bottom of that page. The output consists of only the lowercase letters a-z and spaces. If you claim that this is not in canonical form, then prove it. Specify a criteria for canonical form, a pass/fail test. I want an algorithm or a program, no hand waving or generalities. Input an arbitrary string, output yes or no.Do you see my point now? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 8:52:27 PMSubject: Re: [agi] Lossy ** lossless compression

I
think that either putting Wikipedia in canonical form, or recognizing that it is
in canonical form, are two equally difficult problems. So the problem does
not go away easily.

Um. I think you missed my point. The
compression program should be able to take the Wikipedia in it's current form
and the decompression program should be able to output it in canonical
form. Make the contestants do all the difficult work, not the
judges. (and recognizing canonical form should be easy, ensuring it's
completeness is likely to be a real problem, but that's what you have the other
contestants for . . . . :-)

- Original Message -
From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Saturday, August 26, 2006 5:33
PM
Subject: Re: [agi] Lossy ** lossless
compression

I
think that either putting Wikipedia in canonical form, or recognizing that it
is in canonical form, are two equally difficult problems. So the problem
does not go away easily.
-- Matt Mahoney, [EMAIL PROTECTED]

-
Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday,
August 26, 2006 4:51:07 PMSubject: Re: [agi] Lossy ** lossless
compression

Mark suggested putting Wikipedia in a
canonical form, which would remove the distinction between lossless and lossy
compression.

Hmmm. Interesting . . . . Actually, I
didn't suggest exactly that -- though I can see how
you got that impression. I suggested that the decompression program
should output the Wikipedia in canonical form meaning that it would be lossy
as far as information is concerned (i.e. it loses the exact bit sequence of
the input) but it would be lossless as far as knowledge is concerned.
Putting the Wikipedia in a canonical form (or -- developing a good canonical
form to put the Wikipedia into) strikes me as the largest part of the
challenge (and thus, not something that you want to -- or should -- take on as
contest organizers).

- Original Message -
From: "Matt Mahoney" [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, August 26, 2006 3:29
PM
Subject: Re: [agi] Lossy ** lossless
compression
First let me respond to Boris and Mark. I agree. Mark
suggested putting Wikipedia in a canonical form, which would remove the
distinction between lossless and lossy compression. This will be hard,
but Boris made an important observation that useful data is generally
compressable and useless data (noise) is not. I don't think the problem
can be solved completely but there is clearly room for improvement.
Eliezer suggests putting a model of the universe on a USB drive and
then running the model to predict how many fingers he is holding up.
Let's assume that is possible. Stephen Wolfram suggests the model, if
one exists, might only be a few lines of code. http://en.wikipedia.org/wiki/A_New_Kind_of_Science But we must solve a few other problems
first. 1. It may be hard to find such a model. We
cannot tell whether the apparent randomness of quantum mechanics is truly
random or generated by a deterministic, but random appearing process.
This happens in cryptography. The only way to distinguis between true
random data and an encrypted block of zero bits is to break the
decryption. The former is not compressable, the latter is.
2. Assuming we solve this mystery of the universe and it turns out to
be deterministic, we still have the problem of running the code on a computer
that resides within the universe. If the universe is infinite, then it
is possible because one Turing machine can simulate another. If the
universe is finite (as quantum theory and the Big Bang suggest, also the lack
of real Turing machines), then it is not possible because a state machine
cannot simulate itself. Having the USB drive simulate all of the
universe except itself would resolve this problem, but then if the USB drive
resides outside the universe, how do we read the result? 3.
Assuming we overcome this obstacle, it may be that the program will say how
many fingers, but in that case the program also completely determines my

Re: [agi] Lossy ** lossless compression

2006-08-27 Thread Matt Mahoney

Mark, I didn't get your attachment, the program that tells me if an arbitrary text string is in canonical form or not. Actually, if it will make it any easier, I really only need to know if a string is a canonical representation of Wikipedia.Oh, wait... there can only be one canonical form. I guess then all you have to do is store the canonical form and compare the input with it.After you solve this simple, easy problem and send me the program, I will solve the much harder problem of converting Wikipedia to canonical form.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark
 Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Sunday, August 27, 2006 11:30:44 AMSubject: Re: [agi] Lossy ** lossless compression

 


 Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. 


 I reject your nonsensical 
claim.

 If you claim that this is not in canonical form, then prove it. 
Specify a criteria for canonical form, a pass/fail 
test.

 By definition, a canonical 
form should not have duplication. Your data has massive duplication 
(particularly when looked at on the knowledge level) and is therefore not 
canonical. Simple enough for you?

 Do you see my point now? 
 No, all I see if 
that you're so invested in lossless (at 
the bit-level) compressionthatyou're not even willing to try to work 
to get past it.


  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Saturday, August 26, 2006 9:40 
  PM
  Subject: Re: [agi] Lossy ** lossless 
  compression
  
  Suppose 
  I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in 
  canonical form. The procedure and a program for generating it is 
  described at the bottom of that page. The output consists of only the 
  lowercase letters a-z and spaces. If you claim that this is not in 
  canonical form, then prove it. Specify a criteria for canonical form, a 
  pass/fail test. I want an algorithm or a program, no hand waving or 
  generalities. Input an arbitrary string, output yes or 
  no.Do you see my point now? 
  -- Matt Mahoney, [EMAIL PROTECTED]
  
  - 
  Original Message From: Mark Waser [EMAIL PROTECTED]To: 
  agi@v2.listbox.comSent: Saturday, August 26, 2006 8:52:27 PMSubject: 
  Re: [agi] Lossy ** lossless compression
  

   I 
  think that either putting Wikipedia in canonical form, or recognizing that it 
  is in canonical form, are two equally difficult problems. So the problem 
  does not go away easily.
  
   Um. I think you missed my point. The 
  compression program should be able to take the Wikipedia in it's current form 
  and the decompression program should be able to output it in canonical 
  form. Make the contestants do all the difficult work, not the 
  judges. (and recognizing canonical form should be easy, ensuring it's 
  completeness is likely to be a real problem, but that's what you have the 
  other contestants for . . . . :-)
  
  
  
- 
Original Message - 
From: 
Matt 
Mahoney 
To: 
agi@v2.listbox.com 
Sent: 
Saturday, August 26, 2006 5:33 PM
Subject: 
Re: [agi] Lossy ** lossless compression

I 
think that either putting Wikipedia in canonical form, or recognizing that 
it is in canonical form, are two equally difficult problems. So the 
problem does not go away easily.
-- Matt Mahoney, [EMAIL PROTECTED] 

- 
Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, August 26, 2006 
4:51:07 PMSubject: Re: [agi] Lossy ** lossless compression


 Mark suggested putting Wikipedia in a 
canonical form, which would remove the distinction between lossless and 
lossy compression.

Hmmm. Interesting . . . . Actually, 
I didn't suggest exactly that -- though I can see 
how you got that impression. I suggested that the decompression 
program should output the Wikipedia in canonical form meaning that it would 
be lossy as far as information is concerned (i.e. it loses the exact bit 
sequence of the input) but it would be lossless as far as knowledge is 
concerned. Putting the Wikipedia in a canonical form (or -- developing 
a good canonical form to put the Wikipedia into) strikes me as the largest 
part of the challenge (and thus, not something that you want to -- or should 
-- take on as contest organizers).

- Original Message - 
From: "Matt Mahoney" [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, August 26, 2006 3:29 
PM
Subject: Re: [agi] Lossy ** lossless 
compression
 First let me respond to Boris and Mark. I agree. 
Mark suggested putting Wikipedia in a canonical form, which would remove the 
distinction between lossless and lossy compression. This will be hard, 
but Boris made an important observation t

Re: [agi] Lossy ** lossless compressi

2006-08-27 Thread Matt Mahoney

In showing that compression implies AI, I first make the simplifying assumption 
that everyone shares the same language model.  Then I relax that assumption and 
argue that this makes it easier for a machine to pass the Turing test.

But I see your point.  I argued that a lossless model knows everything that a 
lossy model does, plus more, because the lossless model knows p(x) and p(x'), 
while a lossy model only knows p(x) + p(x').  However I missed that the lossy 
model knows that x and x' are equivalent, while the lossless model does not.

However, I think that a lossless model can reasonably derive this information 
by observing that p(x, x') is approximately equal to p(x) or p(x').  In other 
words, knowing both x and x' does not tell you any more than x or x' alone, or 
CDM(x, x') ~ 0.5.  I think this is a reasonable way to model lossy behavior in 
humans.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Philip Goetz [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, August 27, 2006 9:23:25 PM
Subject: Re: [agi] Lossy ** lossless compressi

On 8/25/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 As I stated earlier, the fact that there is normal variation in human 
 language models makes it easier for a machine to pass the Turing test.  
 However, a machine with a lossless model will still outperform one with a 
 lossy model because the lossless model has more knowledge.

That would be true only if there were one correct language model, AND
you knew what it was.
Besides which, every human has a lossy model.  It seems to me that by
your argument, a machine with a lossless model would out-perform a
human, and thus /fail/ the Turing test.

- Phil

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compressi

2006-08-28 Thread Matt Mahoney

On 8/28/06, Mark Waser  wrote: 
How does a lossless model observe that Jim is  extremely fat and James  
continues to be morbidly obese are approximately  equal? 
 
I realize this is far beyond the capabilities of current data compression 
programs, which typically predict the next byte in the context of the last few 
bytes using learned statistics.  Of course we must do better.  The model has to 
either know, or be able to learn, the relationships between Jim and James, 
is and continues to be, fat and obese, etc.  I think a 1 GB corpus is 
big enough to learn most of this knowledge using statistical methods. 
 
C:\res\data\wikigrep -c . enwik9 
 File enwik9: 
 10920493 lines match 
 enwik9: grep: input lines truncated - result questionable 
  
C:\res\data\wikigrep -i -c  fat  enwik9 
 File enwik9: 
 1312 lines match 
 enwik9: grep: input lines truncated - result questionable 
  
 C:\res\data\wikigrep -i -c  obese  enwik9 
File enwik9: 
111 lines match 
enwik9: grep: input lines truncated - result questionable 
 
C:\res\data\wikigrep -i  obese  enwik9 |grep -c  fat  
 File STDIN: 
 14 lines match 
  
So we know that obese occurs in about 0.001% of all paragraphs, but in 1% of 
paragraphs containing fat.  This is an example of a distant bigram model, 
which has been shown to improve word perplexity in offline models [1].  We can 
improve on this method using e.g. latent semantic analysis [2] to exploit the 
transitive property of semantics: if A appears near (means) B and B appears 
near C, then A predicts C. 
 
Likewise, syntax is learnable.  For example, if you encounter the X is you 
know that X is a noun, so you can predict a X was or Xs rather than he X 
or Xed.  This type of knowledge can be exploited using similarity modeling 
[3] to improve word preplexity.  (Thanks to Rob Freeman for pointing me to 
this).

Let me give one more example using the same learning mechanism by which syntax 
is learned:

All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.

Now predict: All frogs are green.  Kermit is a frog.  Therefore...


[1] Rosenfeld, Ronald, A Maximum Entropy Approach to Adaptive Statistical 
Language Modeling, Computer, Speech and Language, 10, 1996. 
 
[2] Bellegarda, Jerome R., Speech recognition experiments using multi-span 
statistical language models, IEEE Intl. Conf. on Acoustics, Speech, and Signal 
Processing, 717-720, 1999. 
 
[3] Ido Dagan, Lillian Lee, Fernando C. N. Pereira, Similarity-Based Models of 
Word Cooccurrence Probabilities, Machine Learning, 1999.   
http://citeseer.ist.psu.edu/dagan99similaritybased.html 
  
-- Matt Mahoney, [EMAIL PROTECTED] 
 
 


---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Vision

2006-09-05 Thread Matt Mahoney

I would like to make some very general comments on AGI.  Marcus Hutter's AIXI 
shows that in a very general sense, the optimal behavior of a goal seeking 
agent at each point in time is to guess that the environment is simulated by 
the shortest Turing machine consistent with all observations so far.  This is 
not a solution to the AGI because the problem is not computable, and is 
intractable even in the restricted domain of space and time bounded 
environments.  Rather AIXI is a unifying framework for all machine learning 
algorithms, such as neural networks, SVM, Bayes, decision trees, GA, 
clustering, whatever.  The implied goal of each is to find the simplest 
hypothesis that fits the data.  More formally, they all use tractable 
algorithms to search a subset of short Turing machines for those consistent 
with the training data.

Let us consider the subset of learning problems that are important to humans 
(vision, language, robotics, etc).  Ben Goertzel made an important observation, 
that AGI = pattern recognition + goals.  These correspond to the two learning 
mechanisms in animal brains, classical conditioning + operant conditioning.

AGI is unsolved, so we cannot yet say which method should be used.  But I 
should comment that the one working model we do have (the human brain) is best 
modeled by neural networks.  I believe the reason that neural networks have so 
far failed to solve the problem is lack of computing power (10^13 connections, 
10^14 connections/second) and lack of corresponding training data (several 
years of video, audio, etc).  Google has this much computing power and data.  
It can already answer simple natural language queries like how many days until 
Xmas?.

There have been lots of smart people working on AI for the last 50 years.  By 
1960 we have already seen language translation, handwriting recognition, 
natural language query answering in restricted domains, chess playing, 
automatic theorem proving, etc.  If there was a shortcut to the general problem 
that didn't require massive computing power, I think we would have found it by 
now.


On 9/4/06, YKY (Yan King Yin) [EMAIL PROTECTED] wrote:
 I think the essense of Hawkins' theory (his HTM [hierarchical temporal
 memory] model) is the compression of sensory experience via pattern
 recognition.  Sensory experience goes in; condensed episodic memory comes
 out.  He does this with neural networks.

 I worked with NNs for a while along exactly the same line of thought.  After
 a while I just decided that NN is too difficult to work with, so I switched
 to (predicate) logic as the substrate for pattern recognition.

 Take an example:

 John hits Mary.
 Mary kicks John.
 Mary kicks John again.
 John hits Mary again.
 etc, etc.

 The point is to recognize that John and Mary are fighting, thus achieving
 compression.  The fighting pattern can be irregular consisting of X hit Y, Y
 kick X, etc.  With logic I can write down a rule for recognizing this pretty
 easily, mainly due to the use of symbolic variables.  So you see the
 compressive power of logic.  NN is just too clumsy to work with.  Although
 we know that the brain somehow must perform this information compression
 with neurons, we just don't understand the mechanisms yet.

 Let's say the goal is to compress visual inputs to the John hits Mary
 level.  I think it can be done using my vision scheme plus a logical
 knowledge representation.  But with NN, this still seems very very
 remote

There is a statistical language modeling solution to this problem.  Counting 
Google hits:

the 25,250,000,000 (at least this many English web pages)
fight 603,000,000
hit kick 49,700,000
hit kick fight 18,500,000

So fight occurs on about 2.3% of all web pages, but 37% of web pages 
containing hit and kick.  

In fact, you could get similar numbers if your training sample had 1/1,000,000 
as many pages.  But to use a smaller training set than that you would have to 
use techniques like LSA to exploit the transitive property of semantics.  If 
there are no documents containing both hit and fight then you could still 
infer the relationship from documents containing both hit and punch plus 
documents containing both punch and fight.

LSA (latent semantic analysis) is described as the factoring of a word-document 
matrix into 3 matrices by SVD (singular value decomposition), where the middle 
matrix is diagonal, then discarding all but a few hundred of the largest 
diagonal terms.  This greatly reduces the storage requirement (i.e. a simpler 
model).  Furthermore, the SVD is equivalent to a 3 layer linear neural network 
with the layers representing words, an abstract semantic space, and documents.  
Not that SVD is fast...

-- Matt Mahoney, [EMAIL PROTECTED]




---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] G0 theory completed

2006-10-06 Thread Matt Mahoney

The issue of control over an AGI was discussed in the singularity mailing list. The question was whether it is possible to guarantee that an AGI will be friendly. It was hotly debated with no consensus.My position is that once you make machines that are smarter than humans, and they do the same, you cannot guarantee anything. This limitation is fundamental, in the same way that you cannot predict if a Turing machine will halt. I cited two papers by Hutter and Legg to support this. Hutter's paper on AIXI proves that the optimal behavior of a rational agent (as a Turing machine) with the goal of maximizing the accumulated reward signal from an unknown interactive environment is to guess that the environment is simulated by the shortest Turing machine consistent with past
observation. Legg's paper on the limits of learnability proves that the shortest Turing machine capable of learning to predict the output of another machine of Kolmogorov complexity n is between n and n + log n.Taken together, the papers explain a lot about the nature of uncertainty in a deterministic universe (as Einstein asserted, in spite of quantum mechanics). Hutter's proof requires the assumption that the universe be computable by a Turing machine. I think his paper (which essentially proves Occam's Razor) would not be so compelling if the universe were not in fact computable, or a simulation. The source of uncertainty is therefore due to the universe having greater Kolmogorov complexity than your brain.Your programming example illustrates this nicely. You can't understand a 30,000 line program all at once, so you divide it into modules with well defined interfaces. You can develop, test, debug, model, predict, etc.
one small module while treating the rest of the program as unpredictable, even though you know it is really deterministic. If you didn't model the program this way, you wouldn't need to check function arguments or throw exceptions. So you are really supporting my argument that you cannot predict (and therefore cannot control) an AGI.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: David Clark [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Friday, October 6, 2006 6:03:58 PMSubject: Re: [agi] G0 theory completed

Matt Mahoney said: If you can't model an AGI in your head, then you can't
program it, understand it, test it, control it, or predict what it will do
either.

Can't model can't program- I have programmed a
number of large (30,000+ lines of C) systems and I couldn't disagree more with
the above statement. At any one time, I can only think about 10 or fewer
detailsof any program at a time. It is only because of the
organization of my code that I am able to make a program that actually
works. As the program grows, the levels of abstraction grows so that (from
my personal experience) I see no limit to the size of complexity of code that I
can create or work on. I rely, for muchof the detail memory, on the
source code and I don't try to model all the details at once at all. I
have other tools that help with higher levels of abstraction.

Understand- I "understand" all parts (in
isolation) of allprograms that I write but for some of the largest ones, I
can't predict (without tools) what the system will always do without actually
running the program. If I wanted to know exactly then I would probably
just run the program and find out for sure.

Testing -There
are many methods of testing and I see no limit to the size of code that can be
tested. Microsoft Windows is the largest set of programs in the world and
it can be argued that they are not fully tested but obviously enough for most
people to use the programs.

Control- Control has special meaning when talking
about an AGI. If you truly had an AGI, would you have any more control
over it if you could totally model or understand the AGI versus
not?

Predict- We can't predict how most people will
react and think even if we have known them our whole lives. If an AGI had
intelligence on par with humans, how could we expect to always predict what they
would think or do when we can't do that with ourselves.

What limit is there to knowledge in general?
What tiny fraction of that knowledge can any single person embody? I don't
know but it must be a tiny fraction of the knowledge we currently have and our
rate of acquiring new knowledge is increasing exponentially. What limit is
there to a database given an increasingly growing memory store? I think
the answer if not infinite, then it must be many orders of magnitude larger than
what a single human is capable of. Using a human mind as an example of the
only way an AGI could possibly be created is flawed. Humans might have a
huge number of limits to learning that AGI's do not. The hardware for an
AGI is not limited to any specific amount and it is also not limited to current
hardware or algorithms. A human is limited to the brain in his skull and
only

Re: [agi] G0 theory completed

2006-10-09 Thread Matt Mahoney

My concern about G0 is that the problem of integrating first order logic or structured symbolic knowledge with language and sensory/motor data is unsolved, even when augmented with weighted connections to represent probability and/or confidence (e.g. fuzzy logic, Bayesian systems, connectionist systems). I think such weighting is an improvement but something else is still missing. People have been working on weighted graphs in various forms for over 20 years. If there was an easy solution, we should have found it by now. I did not see any proposed solution in G0.First order logic is powerful, but that does not mean it is correct. I think it is an oversimplification, and we are discarding something essential for the sake of computational efficiency. The fact that you can
represent Kicks(x,y) means that you can represent nonsense statements like "ball kicks boy". This is not how people think. A person reading such a statement will probably reverse the order of the words because it makes more sense that way. How would a symbolic system do that?I think AGI will be solved when we do two things. First, we must understand what is going on in the human brain. Second, we must build a system with enough hardware to simulate it properly.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, October 9, 2006 2:23:59 PMSubject: Re: [agi] G0 theory completedMatt:

(Sorry about the delay... I was busy advertising in other groups..)

But now that you have completed your theory on how to build AGI, what do you do next?Which parts will you write yourself and which parts will you contract out?
Ideally, any part that can be "out-sourced" should be out-sourced.

At this stage let's see who are interested in this approach...?
I still think there are still some fundamental problems to be solved.Your system is based on first order logic.(You said that is not a fixed design feature, but without a data structure you don't have a design).I am not
aware of any system that has successfully integrated FOL (or its augmented variants) with sensory/motor data or language.

For sensory processing, I think the main reason is that FOL is not probabilistic. We need to combine probability with FOL, which is not that hard. A Bayesian networkcan be viewed as propositional logic + probability.

For natural language, perhaps the reason is thatthey have only focused on inference and ignoredpattern recognition, which, as I argued, is the basis ofdealing withthe semantics of words.

All such systems require human programmers to explicity encode knowledge.You have many examples of how various types of knowledge can be represented.Books and papers on knowledge representation are full of similar examples.What these examples
all lack is an explicit algorithm for acquiring such knowledge.Sure, humans can do it easily, but if you make your learning mechanism this smart, then you have already solved AGI.If it took anything less than human
knowledge to do it, then surely systems like Cyc would have been built this way.Why spend 20 years hand coding millions of rules instead of a few days crunching a terabyte of text off the Internet?

I think there is no shortcut to knowledge acquisition. Doug Lanet has argued that much of common sense knowledge is missing fromthe internet. For example a Google search of "water flowing downhill" returns 987 hits versus 1480 hits for "water flowing uphill". He argued that adult speech
assumes common sense knowledge andthus is a badsource of common sense.

There are multiple reasons why Cyc is not yet successful -- lack of sensory input (vision),not good enough to converse innatural language, inference engine not advanced enough, no probabilities or fuzzy logic, etc. It's like the failure ofearly gliders to fly,which doesn'tmean thatflying with planes is impossible.

Successful language models like Google are based on statistical models, not FOL.Likewise, successful applications in vision or robotics generally use numerical/signal processing/neural models.

A FOL formula such as "Sexy(x) ^ Intelligent(x)" is not unlike aneuron that detects 2 weighted inputs. If you add probabilities to FOL then they are evenmore similar.

But logic is more powerful because it can use variables. For example "Kicks(x,y)" is veryVERY difficult to be expressed by statistical models or NNs, because it has to match "John kicks Mary" and "boy kicks dog", "robot kicks ball", etc.

I think to succeed at AGI, we need to understand the theoretical limits of learning [1,2], then develop a system not based on methods that have already been shown not to work.Then build a system that can learn, give it enough
raw data to do so, and set it loose.
First of all we got to have a wine cup that is capable of holding the wine (knowledge). In other words, we need anarchitecture

Re: [agi] G0 theory completed

2006-10-10 Thread Matt Mahoney

Do you really write 30,000 line programs without writing any error handling code?My argument for the unpredictability of AGI is based on Legg's paper [1]. It proves that a Turing machine cannot predict another machine with greater Kolmogorov complexity.Here I am equating Kolmogorov complexity with intelligence. I think that is reasonable. We already cannot predict what a 30,000 line program will do. During development, we break it down into small modules and work on them one at a time while modeling the rest of the program abstractly. Any simplified, abstract model (one whose Kolmogorov complexity is less than that of the system modeled) must be probabilitisic, an approximation. This is easily proven. If the model is exact, then our original assumption of
the complexity of the system must have been wrong.

[1] Legg, Shane, (2006), Is There an Elegant Universal Theory of
Prediction?, Technical Report
IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle
Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland.

http://www.vetta.org/documents/IDSIA-12-06-1.pdf

-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: David Clark [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, October 10, 2006 11:20:16 AMSubject: Re: [agi] G0 theory completed

You misinterpret my response. I never said I
didn't understand the 30,000 line program. I said I can't think about the
details embodied by the code at the same time. These are quite different
things. If remembering huge numbers of details at one time is your
definition of understanding then humans don't understand almost anything in our
world. We overcome our small short term memory by organization and
external augmentation (make a few notes on paper or computer, memorize known
solutions, etc). I think you define understanding much too
narrowly.

I don't treat the rest of my program as
unpredictable when I am working on just one small module. In fact, I have
a very clear view of what every part of the program will do when I decide to
concentrate on that module or the bigger structures I have created.

I don't check function arguments and have never
used exceptions except where I want to catch a know condition with that
structure. It makes no sense to have exceptions in general if you don't
know what the error will be (so much for unpredictability), because you wouldn't
be able to handle it in any meaningful way.

Your comments don't support your conclusion of the
non-predictability of an AGI but as I stated in my email, an AGI with close to
human intelligence would be no less/more predictable than humans are. I
don't thinkhumans are all that predictable, do you?

My biggest complaint with your email was the idea
of stating flat out that a brain cannot model something more complicated than
itself. I disagree with this view. I think the answer is in the
details. I can model the whole world easily in my limited short term
memory but how detailed and how useful that model is, is open to
debate.

You wrote: I
think to succeed at AGI, we need to understand the theoretical limits of
learning [1,2], then develop a system not based on methods that have already
been shown not to work. Then build a system that can learn, give it enough
raw data to do so, and set it loose.

Understanding learning in humans is helpful but why
would all possible methods of gaining experience and divining solutions be
embodied in humans already? If an AGI is embodied in a computer, and
computers have significantly different attributes than humans, then why would
human "theoretical limits of learning" be the final say for creating an
AGI? I think most people on this list would agree with the build learning
system, add data and go theory but the disagreement is always how much building
needs to be done to get to the most optimal set of learning capability. We
have seen many failures of potential AI programs that tried with a single
learning method and got nowhere. If the answer to AGI was simple or easy,
I think that solution would already have been found.

The nuances of any past failed attempt at AGI also
make your statement about "not based on methods that have already been shown not
to work" not very useful. I don't know of anyone working on AI who thinks
that their particular approach is exactly like the ones that have failed.
Many outsiders to those projects might say that their basic approach has been
shown not to work but the people on that project disagree. An example
would be the many NN projects that have not produced any intelligence but we
know that basically human intelligence is based on NN's. If there is a
simple message in these facts I fail to see it, other than that no approach
should be fully discounted until somebody actually succeeds at building an
AGI.

David Clark

- Original Message -----

From:
Matt
Mahoney
To: agi@v2.listbox.com
Sent: Friday, October 06, 2006

Re: A Mind Ontology Project? [Re: [agi] method for joining efforts]

2006-10-17 Thread Matt Mahoney

YKY, it looks like you removed the G0 page. Is this proprietary now too?http://www.geocities.com/genericai/-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, October 16, 2006 9:37:23 PMSubject: Re: A Mind Ontology Project? [Re: [agi] method for joining efforts]
Re the Mind Ontology page: I have written a "glossary of terms" pertinent to our discussions, including Ben's suggestion of the terms:
-- perception
-- emergence
-- symbol grounding
-- logic
and I also added many of the terms in my architecture (which is not meant to be final, only as aproposal forfurther discussion).

I find no use of "emergence" so I left it undefined =P

I suggest thatweb page's content should be proprietary to Novamente, because it contains some of myideas of G0 in it. [I'd be happy to let Ben use all these ideas perhaps in exchange of a small amount of Novamente shares. Anyway, many of my ideas came from discussions with Ben and members of this list.]

Secondly, I'm not sure what the "ontology" is supposed to mean except as a clarification of terms. If so I guess a few web pages would suffice for this purpose.

Thirdly -- an important point -- I think Ben should focus on dividing the architecture into modules that can be researched and developed(relatively) independently. This is very important because even if Ben's brain has ideas that are better than each of our brains', his ideas will not be better than all ofours combined. So it would be a tremendous step forward if we could decide on a broad way of separating the modules. This I will propose on the page. Personally, I wish to specialize on pattern recognition and perhaps vision (since Ben is also doing some vision).

YKY

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] SOTA

2006-10-19 Thread Matt Mahoney

- Comprehensive (common-sense) knowledge-bases and/or ontologies


Cyc/OpenCyc, Wordnet, etc. but there seems to be no good way for applications 
to use this information and no good alternative to hand coding knowledge.

- Inference engines, etc.

- Adaptive expert systems


A dead end.  There has been little progress since the 1970's.

- Question answering systems


Google.

- NLP components such as parsers, translators, grammar-checkers


Parsing is unsolved.  Translators like Babelfish have progressed little since 
the 1959 Russian-English project.  Microsoft Word's grammar checker catches 
some mistakes but is clearly not AI.

- Interactive robotics systems (sensing/ actuation) - physical or virtual


The Mars Rovers and the DARPA Grand Challenge (robotic auto race) are 
impressive but we clearly have a long way to go before your car drives itself.

- Vision, voice, pattern recognition, etc.


It is difficult to say about face recognition systems, because of their use in 
security, accuracy rates are secret.  I believe they have been oversold.  Voice 
recognition is limited to words and short phrases until we develop better 
language models with AI behind them.  A keyboard is still faster than a 
microphone.

- Interactive learning systems

- Integrated intelligent systems


Lots of theoretical results, but no real applications.

-- Matt Mahoney, [EMAIL PROTECTED]





-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] SOTA

2006-10-19 Thread Matt Mahoney

- Original Message 
From: BillK [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, October 19, 2006 11:43:46 AM
Subject: Re: [agi] SOTA

On 10/19/06, Matt Mahoney wrote:

 - NLP components such as parsers, translators, grammar-checkers

 Parsing is unsolved.  Translators like Babelfish have progressed little since 
 the 1959
 Russian-English project.  Microsoft Word's grammar checker catches some 
 mistakes
 but is clearly not AI.

http://www.charlotte.com/mld/charlotte/news/nation/15783022.htm

I think the problem will eventually be solved.  There was a long period of 
stagnation since the 1959 Russian-English project but I think this period will 
soon end thanks to better language models due to the recent availability of 
large text databases, fast hardware, and cheap memory.  Once we solve the 
language modeling problem, we will remove the main barrier to many NLP problems 
such as speech recognition, translation, OCR, handwriting recognition, and 
question answering.  Google has made good progress in this area using 
statistical modeling methods and was top ranked in a recent competition.  
Google has access to terabytes of text in many languages and a custom operating 
system for running programs in parallel on thousands of PCs.  Here is Google's 
translation of the above article into Arabic and back to English.  But as you 
can see, the job isn't finished.

American soldiers heading to Iraq with a laptop translators from
Stephanie Hinatz daily newspapers (Newport News,va. (ethnic)نورفولكVa.
army-star trip now using similar instrument in Iraq to help the forces
of language training without contact with Iraqi civilians and the
training of the country's emerging police and military forces. the name
of a double discourse to address Albernamjoho translator, which uses
computers to convert spoken English Iraqi pwmound and vice versa. while
the program is still technically in the research and development
stage,Norfolk-based U.S. Joint Forces Command,in conjunction with the
Defense Advanced Research projects Agency,some models has been sent to
Iraq, 70 troops is used in tactical environments to evaluate its
effectiveness. and so far is fine and said Wayne Richards,Commander
leadership in the implementation section. the need for such a device
for the first time in April 2004 when the joint forces command received
an urgent request from commanders on the ground in Abragherichards.
soldiers on the ground needed to improve communication with the Iraqi
people. But because of the shortage of linguists and translators
throughout the Department of Defense do not come from the
difficult,even some of the forces of the so-called most important work
in Iraq today in Iraq, the training of police and military forces. get
those troops trained and capable of maintaining the security of the
country itself is a reminder of return for service members to continue
der inside and outside the war zone. experts are trying to develop this
kind of technical translation for 10 years,He said that Richards.
today, in its current form,The translator is the rugged laptop with the
plugs are two or loudspeakers and Alsmaatrichards pointing to a model
and convert. It is also easy to use Talking on the phone,as evidenced
shortly after the Norfolk demonstration Tuesday. I tell you, an Iraqi
withdrawal on a computer. you put the microphone up to your mouth. when
he said :We are here to provide food and water for your family, You
held by the E key to security in a painting keys. you,I wrote to you
the text of what we discussed to delight on the screen. you wipe the
words to make sure you get exactly. If you can change it manually. when
you are convinced you to the t key to the interpretation and sentence
looming on the screen once Achrihzh time in Arab Iraq. the computer
also says his loud speakers through. the process is the same Balanceof
those who did not talk to you. I repeat what you have and the Arab
computer will spit on you, the words in the English language. as do
translator rights,the program assumes some meanings.  not 100%
Richards. when I ask,For example,Can the newspaper today, the
Arab-language Alanklizihaltrgmeh direct Can the newspaper today.
because in any act made in every conversation with the translator is
taken. any translation is not due to the past program. Defense Language
Institute in California also true of all the translations and Richards.
now,because of its size,the best place to use the translator is at the
center of command and control or a classroom. It is unlikely that the
average Navy will be overseeing the cart with 100 pounds of equipment
to implement that attacks in Baghdad, in Sadr City. We hope if the days
will be small enough that the sergeant to be implemented in a skirt.
Think about it and Richards. sergeant beating on the door of the house
formulateseen in Fallujah. a woman answers the door. The soldier's
weapon. because it is afraid. the soldier immediately to the effects
translator

Re: [agi] SOTA

2006-10-20 Thread Matt Mahoney

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, October 20, 2006 10:14:09 AM
Subject: Re: [agi] SOTA

We have been searching for decades to find shortcuts to fit our 
machines?  When you send a child into her bedroom to search for a 
missing library book, she will often come back 15 seconds after entering 
the room with the statement I searched for it and it's not there. 
Drawing definite conclusions is about equally reliable, in both these cases.

If you have figured out how to implement AI on a PC, please share it
with us.  Until then, you will need a more convincing argument that we
aren't limited by hardware.

A lot of people smarter than you or me have been working on this problem for a 
lot longer than 15 seconds.  James first proposed association models of thought 
in 1890, about 90 years before connectionist neural models were popular.  Hebb 
proposed a model of classical conditioning in which memory is stored in the 
synapse in 1949, decades before the phenomena was actually observed in living 
organisms.  By the early 1960s we had programs that could answer natural 
language queries (the 1959 BASEBALL program), translate Russian to English, 
prove theorems in geometry, solve arithmetic word problems, and recognize 
handwritten digits.

It is not that we can't come up with the right algorithms.  It's that we don't 
have the computing power to implement them.  The most successful AI 
applications today like Google require vast computing power.

If the brain used its hardware in such a way that (say) a million 
neurons were  required to implement a function that, on a computer, 
required a few hundred gates, your comparisons would be meaningless.

I doubt the brain is that inefficient.  There are lower animals that crawl with 
just a couple hundred neurons.  In higher animals, neural processing is 
expensive, so there is evolutionary pressure to compute efficiently.  Most of 
the energy you burn at rest is used by your brain.  Humans had to evolve larger 
bodies than other primates to support our larger brains.  In most neural 
models, it takes only one neuron to implement a logic gate and only one synapse 
to store a bit of memory.

It used to be a standing joke in AI that researchers would claim there 
was nothing wrong with their basic approach, they just needed more 
computing power to make it work.  That was two decades ago:  has this 
lesson been forgotten already?

I don't see why this should not still be true.  The problem is we still do not 
know just how much computing power is needed.  There is still no good estimate 
of the number of synapses in the human brain.  We only know it is probably 
between 10^12 to 10^15 and we aren't even sure of that.  So when AI is solved, 
it will probably be a surprise.

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] SOTA

2006-10-20 Thread Matt Mahoney

- Original Message 
From: Pei Wang [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, October 20, 2006 3:35:57 PM
Subject: Re: [agi] SOTA

On 10/20/06, Matt Mahoney [EMAIL PROTECTED] wrote:

 It is not that we can't come up with the right algorithms.  It's that we 
 don't have the
 computing power to implement them.

Can you give us an example? I hope you don't mean algorithms like
exhaustive search.

For example, neural networks which perform rudamentary pattern detection and 
control for vision, speech, language, robotics etc.  Most of the theory had 
been worked out by the 1980's, but applications have been limited by CPU speed, 
memory, and training data.  The basic building blocks were worked out much 
earlier.  There are only two types of learning in animals, classical 
(association) and operant (reinforcement) conditioning.  Hebb's rule for 
classicical condioning proposed in 1949 is the basis for most neural network 
learning algorithms today.  Models of operant conditioning date back to W. Ross 
Ashby's 1960 Design for a Brain where he used randomized weight adjustments 
to stabilize a 4 neuron system build from vacuum tubes and mechanical 
components.

Neural algorithms are not intractable.  They run in polynomial time.  Neural 
networks can recognize arbitrarily complex patterns by adding more layers and 
training them one at a time.  This parallels the way people learn complex 
behavior.  We learn simple patterns first, then build on them.

 The most successful AI applications today like Google require vast computing 
 power.

In what sense do you call Google an AI application?

Google does pretty well with natural language questions like how many days 
until xmas? even though they don't advertise it that way (like Ask Jeeves did) 
and most people don't use it that way.

Of course you might say that Google isn't doing AI, it is just matching query 
terms to documents.  But it is always that way.  Once you solve the problem, 
it's not AI any more.  Deep Blue isn't AI.  It just implements a chess playing 
algorithm in fast hardware.  Suppose we decide the easiest way to build a huge 
neural network is to use real neurons and some genetic engineering.  Is that AI?

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] SOTA

2006-10-21 Thread Matt Mahoney

With regard to the computational requirements of AI, there is a very clear 
relation showing that the quality of a language model improves by adding time 
and memory, as shown in the following table:  
http://cs.fit.edu/~mmahoney/compression/text.html

And with the size of the training set, as shown in this graph: 
http://cs.fit.edu/~mmahoney/dissertation/

Before you argue that text compression has nothing to do with AI, please read 
http://cs.fit.edu/~mmahoney/compression/rationale.html

I recognize that language modeling is just one small aspect of AGI.  But 
compression gives us hard numbers to compare the work of over 80 researchers 
spanning decades.  The best performing systems push the hardware to its limits. 
 This, and the evolutionary arguments I gave earlier lead me to believe that 
AGI will require a lot of computing power.  Exactly how much, nobody knows.

Whether or not AGI can be accomplished most efficiently with neural networks is 
an open question.  But the one working system we know of is based on it, and we 
ought to study it.  One critical piece of missing knowledge is the density of 
synapses in the human brain.  I think this could be resolved by putting some 
brain tissue under an electron microscope, but I guess that the number is not 
important to neurobiologists.

I read Pei Wang's paper, http://nars.wang.googlepages.com/wang.AGI-CNN.pdf
Some of the shortcomings of neural networks mentioned only apply to classical 
(feedforward or symmetric) neural networks, not to asymmetric networks with 
recurrent circuits and time delay elements, as exist in the brain.  Such 
circuits allow for short term stable or oscillating states which overcome some 
shortcomings such as the inability to train on multiple goals, which could be 
accomplished by turning parts of the network on or off.  Also, it is not true 
that training has to be offline using multiple passes, as with backpropagation. 
 Human language is structured so that layers can be trained progressively 
without need to search over hidden units.  Word associations like sun-moon or 
to-from are linear.  Some of the top compressors mentioned above (paq8, 
WinRK) use online, single pass neural networks to combine models, alternating 
prediction and training.

But it is interesting that most of the remaining shortcomings are also 
shortcomings of human thought, such as the inability to insert or represent 
structured knowledge accurately.  This is evidence that our models are correct. 
 This does not mean they are the best answer.  We don't want to duplicate the 
shortcomings of humans.  We do not want to slow down our responses and insert 
errors in order to pass the Turing test (as in Turing's 1950 example).


-- Matt Mahoney, [EMAIL PROTECTED]


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] SOTA

2006-10-21 Thread Matt Mahoney

- Original Message 
From: Pei Wang [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, October 21, 2006 5:25:13 PM
Subject: Re: [agi] SOTA

For example, the human mind and some other AI techniques handle
structured knowledge much better than NN does.

Is this because the brain is representing the knowledge differently than a 
classical neural network, or because the brain has a lot more memory and can 
afford to represent structured knowledge inefficiently?

I agree with the conclusion of your paper that a classical neural network is 
not sufficient to solve AGI.  The brain is much more complex than that.  But I 
think a neural architecture or a hybrid system that includes neural networks of 
some type is the right direction.  For example, Novamente (if I understand 
correctly, a weighted hypergraph) has some resemblance to a neural network

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

[agi] Language modeling

2006-10-22 Thread Matt Mahoney

- Original Message  

From: Pei Wang [EMAIL PROTECTED] 

To: agi@v2.listbox.com 

Sent: Saturday, October 21, 2006 7:03:39 PM 

Subject: Re: [agi] SOTA 

Well, in that sense NARS also has some resemblance to a neural 

network, as well as many other AI systems. 

Also to Novamente, if I understand correctly.  Terms are linked by a 
probability and confidence.  This seems to me to be an optimization of a neural 
network or connectionist model, which is restricted to one number per link, 
representing probability.  To model confidence you would have to make redundant 
copies of the input and output units and their connections.  This would be 
inefficient, of course. 

One aspect of NARS and many other structured or semi-structured knowledge 
representations that concerns me is the direct representation of concepts such 
as is-a, equivalence, logic (if-then, and, or, not), quantifiers 
(all, some), time (before and after), etc.  These things seem 
fundamental to knowledge but are very hard to represent in a neural network, so 
it seems expedient to add them directly.  My concern is that the direct 
encoding of such knowledge greatly complicates attempts to use natural 
language, which is still an unsolved problem.  Language is the only aspect of 
intelligence that separates humans from other animals.  Without language, you 
do not have AGI (IMHO). 

My concern is that structured knowledge is inconsistent with the development of 
language in children.  As I mentioned earlier, natural language has a structure 
that allows direct training in neural networks using fast, online algorithms 
such as perceptron learning, rather than slow algorithms with hidden units such 
as back propagation.  Each feature is a linear combination of previously 
learned features followed by a nonlinear clamping or threshold operation.  
Working in this fashion, we can represent arbitrarily complex concepts.  In a 
connectionist model, we have, for example: 

- pixels 

- line segments 

- letters 

- words 

- phrases, parts of speech 

- sentences 

etc. 

Children also learn language as a progression toward increasingly complex 
patterns. 

- phonemes beginning at 2-4 weeks 

- phonological rules for segmenting continuous speech at 7-10 months [1] 

- words (semantics) beginning at 12 months 

- simple sentences (syntax) at 2-3 years 

- compound sentences around 5-6 years 

Attempts to change the modeling order are generally unsuccessful.  For example, 
attempting to parse a sentence first and then extract its meaning does not 
work.  You cannot parse a sentence without semantics.  For example, the correct 
parse of I ate pizza with NP depends on whether NP is pepperoni, a fork, 
or Sam. 

Now when we hard code knowledge about logic, quantifiers, time, and other 
concepts and then try to retrofit NLP to it, we are modeling language in the 
worst possible order.  Such concepts, needed to form compound sentences, are 
learned at the last stage of language deveopment.  In fact, some tribal 
languages such as Piraha [2] do not ever reach this stage, even for adults. 

My caution is that any language model we develop has to be trainable in order 
from simple to complex.  The model has to be able to first learn simple 
sentences in the absence of any knowledge of logical relations, and then there 
must be a mechanism for learning such relations.  I realize that human models 
of logical relations must be horribly inefficient, given how long it takes 
children to learn them.  I think to solve AGI, we need to develop a better 
understanding of such models.  I do not hold out too much hope for a 
computationally efficient solution, given our long past record of failure. 

[1] Jusczyk, Peter W. (1996), “Investigations of the word segmentation 
abilities of infants”, 4’th Intl. Conf. on Speech and Language Processing, Vol. 
3, 1561-1564 

[2] The Piraha challenge: an Amazonian tribe takes grammar to a strange place, 
Science News, Dec. 10, 2005,

http://www.findarticles.com/p/articles/mi_m1200/is_24_168/ai_n16029317/pg_1 

-- Matt Mahoney, [EMAIL PROTECTED] 

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Language modeling

2006-10-23 Thread Matt Mahoney

I am interested in identifying barriers to language modeling and how to 
overcome them.

I have no doubt that probabilistic models such as NARS and Novamente can 
adequately represent human knowledge.  Also, I have no doubt they can learn 
e.g. relations such as all frogs are green from examples of green frogs.  My 
question relates to solving the language problem: how to convert natural 
language statements like frogs are green and equivalent variants into the 
formal internal representation without the need for humans to encode stuff like 
(for all X, frog(X) = green(X)).  This problem is hard because there might not 
be terms that exactly correspond to frog or green, and also because 
interpreting natural language statements is not always straightforward, e.g. I 
know it was either a frog or a leaf because it was green.

Converting natural language to a formal representation requires language 
modeling at the highest level.  The levels from lowest to highest are: 
phonemes, word segmentation rules, semantics, simple sentences, compound 
sentences.  Regardless of whether your child learned to read at age 3 or not at 
all, children always learn language in this order.

The state of the art in language modeling is at the level of simple sentences, 
modeling syntax using n-grams (usually trigrams) or hidden Markov models 
generally without recursion (flat), and modeling semantics as word 
associations, possibly generalizing via LSA or clustering to exploit the 
transitive property (if A means B and B means C, then A means C).  This is the 
level of modeling of the top text compressors on the large text benchmark and 
the lowest perplexity models used in speech recognition.  I gave an example of 
a Google translation of English to Arabic and back.  You may have noticed that 
strings of up to about 6 words looked grammatically correct, but that longer 
sequences contained errors.  This is a characteristic of trigram models.  
Shannon noted in 1949 that random sequences that fit the n-gram (letter or 
word) statistics of English appear correct up to about 2n.

All of these models have the property that they are trained in the same order 
that children learn language.  For example, parsing sentences without semantics 
is difficult, but extracting semantics without parsing (text search) is easy.  
As a second example, it is possible to build a lexicon from text only if you 
know the rules for word segmentation.  However, the reverse is not true.  It is 
not necessary to have a lexicon to segment continuous text (spaces removed).  
The segmentation rules can be derived from n-gram statistics, analogous to 
learning the phonological rules for segmenting continuous speech.  This was 
first demonstrated in text by Hutchens and Alder, which I improved on in 1999.  
http://cs.fit.edu/~mmahoney/dissertation/lex1.html

With this observation, it seems that hard coding rules for inheritance, 
equivalence, logical, temporal etc. relations, into a knowledge representation 
will not help in learning these relations from text.  The language model still 
has to learn these relations from previously learned, simpler concepts.  In 
other words, the model has to learn the meanings of is, and, not, 
if-then, all, before, etc. without any help from the structure of the 
knowledge represenation or explicit encoding.  The model has to first learn how 
to convert compound sentences into a formal representation and back, and only 
then can it start using or adding to the knowledge base.

So my question is: what is needed to extend language models to the level of 
compound sentences?  More training data?  Different training data?  A new 
theory of language acquisition?  More hardware?  How much?

-- Matt Mahoney, [EMAIL PROTECTED]


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Language modeling

2006-10-25 Thread Matt Mahoney

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, October 24, 2006 12:37:16 PM
Subject: Re: [agi] Language modeling

Matt Mahoney wrote:
 Converting natural language to a formal representation requires language 
 modeling at the highest level.  The levels from lowest to highest are: 
 phonemes, word segmentation rules, semantics, simple sentences, compound 
 sentences.  Regardless of whether your child learned to read at age 3 or 
 not at all, children always learn language in this order.

And the evidence for this would be what?

Um, any textbook on psycholinguistics or developmental psychology, also the 
paper by Jusczyk I cited earlier.  Ben pointed me to a book by Tomasello which 
I haven't read, but here is a good summary of his work on language acquisition 
in children.
http://email.eva.mpg.de/~tomas/pdf/Mussen_chap_proofs.pdf

I realize that the stages of language learning overlap, but they do not all 
start at the same time.  It is a simple fact that children learn words with 
semantic content like ball or milk before they learn function words like 
the or of, in spite of the higher frequency of the latter.  Likewise, 
successful language models used for information retrieval ignore function words 
and word order. Furthermore, children learn word segmentation rules before they 
learn words, again consistent with statistical language models.  (The fact that 
children can learn sign language at 6 months is not inconsistent with these 
models.  Sign language does not have the word segmentation problem).

We can learn from these observations.  One conclusion that I draw is that you 
can't build an AGI and tack on language modeling later.  You have to integrate 
language modeling and train it in parallel with nonverbal skills such as vision 
and motor control, similar to training a child.  We don't know today whether 
this will turn out to be true.

Another important question is: how much will this cost?  How much CPU, memory, 
and training data do you need?  Again we can use cognitive models to help 
answer these questions.  According to Tomasello, children are exposed to about 
5000 to 7000 utterances per day, or about 20,000 words.  This is equivalent to 
about 100 MB of text in 3 years.  Children learn to use simple sentences of the 
form (subject-verb-object) and recognize word order in these sentences at about 
22-24 months.  For example, they respond correctly to make the bunny push the 
horse.  However, such models are word specific.  At about age 3 1/2, children 
are able to generalize novel words used in context as a verb to other syntactic 
constructs, e.g. to construct transitive sentences given examples where the 
verb is used only intransitively.  This is about the state of the art with 
statistical models trained on hundreds of megabytes of text.  Such experiments 
suggest that adult level modeling, which will be needed to interface with 
structured knowledge bases, will require about a gigabyte of training data.

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Motivational Systems that are stable

2006-10-27 Thread Matt Mahoney

My comment on Richard Loosemore's proposal: we should not be confident in our ability to produce a stable motivational system. We observe that motivational systems are highly stable in animals (including humans). This is only because if an animal can manipulate its motivations in any way, then it is quickly removed by natural selection. Examples of manipulation might be to turn off pain or hunger or reproductive drive, or to stimulate its pleasure center. Humans can do this to some extent by using drugs, but this leads to self destructive behavior. In experiments where a mouse can stimulate its pleasure center via an electrode in its brain by pressing a lever, it will press the lever, foregoing food and water until it dies.So we should not take the existence of stable
 motivational systems in nature as evidence that we can get it right. These systems are complex, have evolved over a long time, and even then don't always work in the face of technology or a rapidly changing environment.-- Matt Mahoney, [EMAIL PROTECTED]
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Motivational Systems that are stable

2006-10-28 Thread Matt Mahoney

- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Saturday, October 28, 2006 10:23:58 AMSubject: Re: [agi] Motivational Systems that are stableI disagree that humans really have a "stable motivational system" or would have to have a much more strict interpretation of that phrase. Overall humans as a society have in general a stable system (discounting war and etc) But as individuals, too many humans are unstable in many small if not totally self-destructivee ways.I think we are
misunderstanding. By "motivational system" I mean the part of the brain (or AGI) that provides the reinforcement signal (reward or penalty). By "stable", I mean that you have no control over the logic of this system. You cannot train it like you can train the other parts of your brain. You cannot learn to turn off pain or hunger or fear or fatigue or the need for sleep, etc. You cannot alter your emotional state. You cannot make yourself feel happy on demand. You cannot make yourself like what you don't like and vice versa. The pathways from your senses to the pain/pleasure centers of your brain are hardwired, determined by genetics and not alterable through learning.For an AGI it is very important that a motivational system be stable. The AGI should not be able to reprogram it. If it could, it could simply program itself for maximum pleasure and enter a degenerate state where it ceases to learn through
reinforcement. It would be like the mouse that presses a lever to stimulate the pleasure center of its brain until it dies.It is also very important that a motivational system be correct. If the goal is that an AGI be friendly or obedient (whatever that means), then there needs to be a fixed function of some inputs that reliably detects friendliness or obedience. Maybe this is as simple as a human user pressing a button to signal pain or pleasure to the AGI. Maybe it is something more complex, like a visual system that recognizes facial expressions to tell if the user is happy or mad. If the AGI is autonomous, it is likely to be extremely complex. Whatever it is, it has to be correct.To answer your other question, I am working on natural language processing, although my approach is somewhat unusual.http://cs.fit.edu/~mmahoney/compression/text.html-- Matt Mahoney, [EMAIL PROTECTED]
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Natural versus formal AI interface languages

2006-10-31 Thread Matt Mahoney

I guess the AI problem is solved, then. I can already communicate with my computer using formal, unambiguous languages. It already does a lot of things better than most humans, like arithmetic, chess, memorizing long lists and recalling them perfectly...If a machine can't pass the Turing test, then what is your definition of intelligence?-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: John Scanlon [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, October 31, 2006 8:48:43 AMSubject: [agi] Natural versus formal AI interface languages

 
 


One of the major obstacles to real AI is the belief 
thatknowledge ofa natural language is necessary for 
intelligence. Ahuman-level intelligent system should be expected to 
have the ability to learn a natural language, but it is not necessary. It 
is better to start with a formal language, with unambiguous formal 
syntax,as the primary interface between human beings and AI systems. 
This type of language could be called a "para-natural 
formallanguage." It eliminatesall of the syntactical ambiguity 
that makes competent use of a natural language so difficult to implement in an 
AI system. Such a language would also be a member of the class "fifth 
generation computer language."

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Natural versus formal AI interface languages

2006-10-31 Thread Matt Mahoney

Artificial languages that remove ambiguity like Lojban do not bring us any 
closer to solving the AI problem.  It is straightforward to convert between 
artificial languages and structured knowledge (e.g first order logic), but it 
is still a hard (AI complete) problem to convert between natural and artificial 
languages.  If you could translate English - Lojban - English, then you could 
just as well translate, e.g. English - Lojban - Russian.  Without a natural 
language model, you have no access to the vast knowledge base of the Internet, 
or most of the human race.  I know people can learn Lojban, just like they can 
learn Cycl or LISP.  Lets not repeat these mistakes.  This is not training, it 
is programming a knowledge base.  This is narrow AI.
 
-- Matt Mahoney, [EMAIL PROTECTED]



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Natural versus formal AI interface languages

2006-11-02 Thread Matt Mahoney

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, October 31, 2006 9:26:15 PM
Subject: Re: Re: [agi] Natural versus formal AI interface languages

Here is how I intend to use Lojban++ in teaching Novamente.  When
Novamente is controlling a humanoid agent in the AGISim simulation
world, the human teacher talks to it about what it is doing.  I would
like the human teacher to talk to it in both Lojban++ and English, at
the same time.  According to my understanding of Novamente's learning
and reasoning methods, this will be the optimal way of getting the
system to understand English.  At once, the system will get a
perceptual-motor grounding for the English sentences, plus an
understanding of the logical meaning of the sentences.  I can think of
no better way to help a system understand English.  Yes, this is not
the way humans do it. But so what?  Novamente does not have a human
brain, it has a different sort of infrastructure with different
strengths and weaknesses.

What about using baby English instead of an artificial language?  By this I 
mean simple English at the level of a 2 or 3 year old child.  Baby English has 
many of the properties that make artificial languages desirable, such as a 
small vocabulary, simple syntax and lack of ambiguity.  Adult English is 
ambiguous because adults can use vast knowledge and context to resolve 
ambiguity in complex sentences.  Children lack these abilities.

I don't believe it is possible to map between natural and structured language 
without solving the natural language modeling problem first.  I don't believe 
that having structured knowledge or a structured language available makes the 
problem any easier.  It is just something else to learn.  Humans learn natural 
language without having to learn structured languages, grammar rules, knowledge 
representation, etc.  I realize that Novamente is different from the human 
brain.  My argument is based on the structure of natural language, which is 
vastly different from artificial languages used for knowledge representation.  
To wit:

- Artificial languages are designed to be processed (translated or compiled) in 
the order: lexical tokenization, syntactic parsing, semantic extraction.  This 
does not work for natural language.  The correct order is the order in which 
children learn: lexical, semantics, syntax.  Thus we have successful language 
models that extract semantics without syntax (such as information retrieval and 
text categorization), but not vice versa.

- Artificial language has a structure optimized for serial processing.  Natural 
language is optimized for parallel processing.  We resolve ambiguity and errors 
using context.  Context detection is a type of parallel pattern recognition.  
Patterns can be letters, groups of letters, words, word categories, phrases, 
and syntactic structures.  We recognize and combine perhaps tens or hundreds of 
patterns simultaneously by matching to perhaps 10^5 or more from memory.  
Artificial languages have no such mechanism and cannot tolerate ambiguity or 
errors.

- Natural language has a structure that allows incremental learning.  We can 
add words to the vocabulary one at a time.  Likewise for phrases, idioms, 
classes of words and syntactic structures.  Artificial languages must be 
processed by fixed algorithms.  Learning algorithms are unknown.

- Natural languages evolve slowly in a social environment.  Artificial 
languages are fixed according to some specificiation.

- Children can learn natural languages.  Artificial languages are difficult to 
learn even for adults.

- Writing in an artificial language is an iterative process in which the output 
is checked for errors by a computer and the utterance is revised.  Natural 
language uses both iterative and forward error correction.

By natural language I include man made languages like Esperanto.  Esperanto 
was designed for communication between humans and has all the other properties 
of natural language.  It lacks irregular verbs and such, but this is really a 
tiny part of a language's complexity.  A natural language like English has a 
complexity of about 10^9 bits.  How much information does it take to list all 
the irregularities in English like swim-swam, mouse-mice, etc?

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Natural versus formal AI interface languages

2006-11-02 Thread Matt Mahoney

I don't know enough about Novamente to say if your approach would work.  Using 
an artificial language as part of the environment (as opposed to a substitute 
for natural language) does seem to make sense.

I think an interesting goal would be to teach an AGI to write software.  If I 
understand your explanation, this is the same problem.  I want to teach the AGI 
two languages (English and x86-64 machine code), one to talk to me and the 
other to define its environment.  I would like to say to the AGI, write a 
program to print the numbers 1 through 100, are there any security flaws in 
this web browser? and ultimately, write a program like yourself, but smarter.

This is obviously a hard problem, even if I substitute a more English-like 
programming language like COBOL.  To solve the first example, the AGI needs an 
adult level understanding of English and arithmetic.  To solve the second, it 
needs a comprehensive world model, including an understanding of how people 
think and the things they can experience.  (If an embedded image can set a 
cookie, is this a security flaw?).  When it can solve the third, we are in 
trouble (topic for another list).

How could such an AGI be built?   What would be its architecture?  What 
learning algorithm?  What training data?  What computational cost?
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, November 2, 2006 3:45:42 PM
Subject: Re: Re: [agi] Natural versus formal AI interface languages

Yes, teaching an AI in Esperanto would make more sense than teaching
it in English ... but, would not serve the same purpose as teaching it
in Lojban++ and a natural language in parallel...

In fact, an ideal educational programme would probably be to use, in parallel

-- an Esperanto-based, rather than English-based, version of  Lojban++
-- Esperanto

However, I hasten to emphasize that this whole discussion is (IMO)
largely peripheral to AGI.

The main point is to get the learning algorithms and knowledge
representation mechanisms right.  (Or if the learning algorithm learns
its own KR's, that's fine too...).  Once one has what seems like a
workable learning/representation framework, THEN one starts talking
about the right educational programme.  Discussing education in the
absence of an understanding of internal learning algorithms is perhaps
confusing...

Before developing Novamente in detail, I would not have liked the idea
of using Lojban++ to help teach an AGI, for much the same reasons that
you are now complaining.

But now, given the specifics of the Novamente system, it turns out
that this approach may actually make teaching the system considerably
easier -- and make the system more rapidly approach the point where it
can rapidly learn natural language on its own.

To use Eric Baum's language, it may be that by interacting with the
system in Lojban++, we human teachers can supply the baby Novamente
with much of the inductive bias that humans are born with, and that
helps us humans to learn natural languages so relatively easily

I guess that's a good way to put it.  Not that learning Lojban++ is a
substitute for learning English, rather that the knowledge gained via
interaction in Lojban++ may be a substitute for human babies'
language-focused and spacetime-focused inductive bias.

Of course, Lojban++ can be used in this way **only** with AGI systems
that combine
-- a robust reinforcement learning capability
-- an explicitly logic-based knowledge representation

But Novamente does combine these two factors.

I don't expect to convince you that this approach is a good one, but
perhaps I have made my motivations clearer, at any rate.  I am
appreciating this conversation, as it is pushing me to verbally
articulate my views more clearly than I had done before.

-- Ben G



On 11/2/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 - Original Message 
 From: Ben Goertzel [EMAIL PROTECTED]
 To: agi@v2.listbox.com
 Sent: Tuesday, October 31, 2006 9:26:15 PM
 Subject: Re: Re: [agi] Natural versus formal AI interface languages

 Here is how I intend to use Lojban++ in teaching Novamente.  When
 Novamente is controlling a humanoid agent in the AGISim simulation
 world, the human teacher talks to it about what it is doing.  I would
 like the human teacher to talk to it in both Lojban++ and English, at
 the same time.  According to my understanding of Novamente's learning
 and reasoning methods, this will be the optimal way of getting the
 system to understand English.  At once, the system will get a
 perceptual-motor grounding for the English sentences, plus an
 understanding of the logical meaning of the sentences.  I can think of
 no better way to help a system understand English.  Yes, this is not
 the way humans do it. But so what?  Novamente does not have a human
 brain, it has a different sort of infrastructure with different
 strengths and weaknesses.

 What about using

Re: Re: [agi] Natural versus formal AI interface languages

2006-11-03 Thread Matt Mahoney

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, November 3, 2006 9:28:24 PM
Subject: Re: Re: [agi] Natural versus formal AI interface languages

I do not agree that having precise quantitative measures of system
intelligence is critical, or even important to AGI.

The reason I ask is not just to compare different systems (which you can't 
really do if they serve different purposes), but also to measure progress.  
When I experiment with language models, I often try many variations, tune 
parameters, etc., so I need a quick test to see if what I did worked.  I can do 
that very quickly using text compression.  I can test tens or hundreds of 
slightly different models per day and make very precise measurements.  Of 
course it is also useful that I can tell if my model works better or worse than 
somebody else's model that uses a completely different method.

There does not seem to be much cooperation on this list toward the goal of 
achieving AGI.  Everyone has their own ideas.  That's OK.  The purpose of 
having a metric is not to make it a race, but to help us communicate what works 
and what doesn't so we can work together while still pursuing our own ideas.  
Papers on language modeling do this by comparing different algorithms and 
reporting the results by word perplexity.  So you don't have to re-experiment 
with various n-gram backoff models, LSA, statistical parsers, etc.  You already 
know a lot about what works and what doesn't.

Another reason for measurements is that it makes your goals concrete.  How do 
you define general intelligence?  Turing gave us a well defined goal, but 
there are some shortcomings.  The Turing test is subjective, time consuming, 
isn't appropriate for robotics, and really isn't a good goal if it means 
deliberately degrading performance in order to appear human.  So I am looking 
for better tests.  I don't believe the approach of let's just build it and 
see what it does is going to produce anything useful.

-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-04 Thread Matt Mahoney

Ben,
The test you described (Easter Egg Hunt) is a perfectly good example of the 
type of test I was looking for.  When you run the experiment you will no doubt 
repeat it many times, adjusting various parameters.  Then you will evaluate by 
how many eggs are found, how fast, and the extent to which it helps the system 
learns to play Hide and Seek (also a measurable quantity).

Two other good qualities are that the test is easy to describe and obviously 
relevant to intelligence.  For text compression, the relevance is not so 
obvious.

I look forward to seeing a paper on the outcome of the tests.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, November 3, 2006 10:51:16 PM
Subject: Re: Re: Re: Re: [agi] Natural versus formal AI interface languages

 I am happy enough with the long-term goal of independent scientific
 and mathematical discovery...

 And, in the short term, I am happy enough with the goals of carrying
 out the (AGISim versions of) the standard tasks used by development
 psychologists to study childrens' cognitive behavior...

 I don't see a real value to precisely quantifying these goals, though...

To give an example of the kind of short-term goal that I think is
useful, though, consider the following.

We are in early 2007 (if all goes according to plan) going to teach
Novamente to carry out a game called iterated Easter Egg hunt --
basically, to carry out an Easter Egg hunt in a room full of other
agents ... and then do so over and over again, modeling what the other
agents do and adjusting its behavior accordingly.

Now, this task has a bit in common with the game Hide-and-Seek.  So,
you'd expect that a Novamente instance that had been taught iterated
Easter Egg Hunt, would also be good at hide-and-seek.  So, we want to
see that the time required for an NM system to learn hide-and-seek
will be less if the NM system has previously learned to play iterated
Easter Egg hunt...

This sort of goal is, I feel, good for infant-stage AGI education
However, I wouldn't want to try to turn it into an objective IQ
test.  Our goal is not to make the best possible system for playing
Easter Egg hunt or hide and seek or fetch or whatever

And, in terms of language learning, our initial goal will not be to
make the best possible system for conversing in baby-talk...

Rather, our goal will be to make a system that can adequately fulfill
these early-stage tasks, but in a way that we feel will be
indefinitely generalizable to more complex tasks.

This, I'm afraid, highlights a general issue with formal quantitative
intelligence measures as applied to immature AGI systems/minds.  Often
the best way to achieve some early-developmental-stage task is going
to be an overfitted, narrow-AI type of algorithm, which is not easily
extendable to address more complex tasks.

This is similar to my complaint about the Hutter Prize.  Yah, a
superhuman AGI will be an awesome text compressor.  But this doesn't
mean that the best way to achieve slightly better text compression
than current methods is going to be **at all** extensible in the
direction of AGI.

Matt, you have yet to convince me that seeking to optimize interim
quantitative milestones is a meaningful path to AGI.  I think it is
probably just a path to creating milestone-task-overfit narrow-AI
systems without any real AGI-related expansion potential...

-- Ben

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-05 Thread Matt Mahoney

Another important lesson from SHRDLU, aside from discovering that the approach 
of hand coding knowledge doesn't work, was how long it took to discover this.  
It was not at all obvious from the initial success.  Cycorp still hasn't 
figured it out after over 20 years.

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Charles D Hixson [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, November 5, 2006 4:46:12 PM
Subject: Re: [agi] Natural versus formal AI interface languages

Richard Loosemore wrote:
 ...
 This is a question directed at this whole thread, about simplifying 
 language to communicate with an AI system, so we can at least get 
 something working, and then go from there

 This rationale is the very same rationale that drove researchers into 
 Blocks World programs.  Winograd and SHRDLU, etc.  It was a mistake 
 then:  it is surely just as much of a mistake now.
 Richard Loosemore.
 -
Not surely.  It's definitely a defensible position, but I don't see any 
evidence that it has even a 50% probability of being correct.

Also I'm not certain that SHRDLU and Blocks World were mistakes.  They 
didn't succeed in their goals, but they remain as important markers.  At 
each step we have limitations imposed by both our knowledge and our 
resources.  These limits aren't constant.  (P.S.:  I'd throw Eliza into 
this same category...even though the purpose behind Eliza was different.)

Think of the various approaches taken as being experiments with the user 
interface...since that's a large part of what they were.  They are, of 
course, also experiments with how far one can push a given technique 
before encountering a combinatorial explosion.  People don't seem very 
good at understanding that intuitively.  In neural nets this same 
problem re-appears as saturation, the point at which as you learn new 
things old things become fuzzier and less certain.  This may have some 
relevance to the way that people are continually re-writing their 
memories whenever they remember something.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-06 Thread Matt Mahoney

- Original Message 
From: BillK [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Monday, November 6, 2006 10:08:09 AM
Subject: Re: [agi] Natural versus formal AI interface languages

Ogden said that it would take seven years to learn English, seven
months for Esperanto, and seven weeks for Basic English, comparable
with Ido.

Basic English = 850 words = 10 words per day.
Esperanto = 900 root forms or 17,000 words 
(http://www.freelang.net/dictionary/esperanto.html) = 4 to 80 words per day.
English = 30,000 to 80,000 words = 12 to 30 words per day.
SHRDLU = 200 words? = 0.3 words per day for 2 years.
 
-- Matt Mahoney, [EMAIL PROTECTED]





-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The concept of a KBMS

2006-11-06 Thread Matt Mahoney

- Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, November 6, 2006 7:49:06 PMSubject: Re: [agi] The concept of a KBMSThis is the specification of my logic:
http://www.geocities.com/genericai/GI-Geniform.htm
I conjecture thatNL sentences can be easilytranslated to/fromthis form.
I conjecture it will be hard.Here is why. If it was easy to translate between natural language and an unambiguous, structured form, then it would be easy to translate between two natural languages, e.g. Russian - Geniform - English. This problem is known to be hard.What does the prepositional phrase "with" modify in "I ate pizza with {pepperoni, a fork, gusto, George}"?What does "they" refer to in (from Lenat) "The police arrested the demonstrators because they {feared, advocated} violence"?What does "it" refer to in "it is raining"?Is the following sentence correct: "The cat caught a moose"?What is the structured representation of "What?"-- Matt Mahoney, [EMAIL PROTECTED]
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The crux of the problem

2006-11-07 Thread Matt Mahoney

James Ratcliff [EMAIL PROTECTED] wrote:Many of these examples actually arnt hard, if you use some statitisical information and common sense knowledge base.The problem is not that these examples are hard, but that are are millions of them. To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on. There is no neat, tidy algorithm that will generate all of this knowledge. You can't do any better than to just write down all of these facts. The data is not compressable.I said millions, but we really don't know, maybe 10^9 bits. We have a long history of underestimating the complexity of natural language, going back to SHRDLU, Eliza, and the 1959 BASEBALL program, all of
which could parse simple sentences. Cycorp is the only one who actually collected this much common human knowledge in a structured form. They probably did not expect it would take 20 years of manual coding, only to discover you can't build the knowledge base first and then tack on a natural language interface later. Something is still wrong.We have many ways to represent knowledge: LISP lists, frame-slot, augmented first order logic, term logic, Bayesian, connectionist, NARS, Novamente, etc. Humans can easily take sentences and convert them into the internal representation of any of these systems. Yet none of these systems has solved the natural language interface problem. Why is this?You can't ignore information theory. A Turing machine can't model another machine with greater Kolmogorov complexity. The brain can't understand itself. We want to build data structures where we can see how knowledge is
represented so we can test and debug our systems. Sorry, information theory doesn't allow it. You can't have your AGI and understand it too. We need to think about opaque representations, systems we can train and test without looking inside, systems that work but we don't know how. This will be hard, but we have already tried the easy ways.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, November 7, 2006 9:38:54 AMSubject: Re: [agi] The concept of a KBMSMatt Mahoney [EMAIL PROTECTED] wrote: - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Monday, November 6, 2006 7:49:06 PMSubject: Re: [agi] The concept of a KBMSThis is the specification of my logic: http://www.geocities.com/genericai/GI-Geniform.htm I conjecture thatNL sentences can be easilytranslated to/fromthis form. I conjecture it will be hard.Here is why. If it was easy to translate between natural
language and an unambiguous, structured form, then it would be easy to translate between two natural languages, e.g. Russian - Geniform - English. This problem is known to be hard.What does the prepositional phrase "with" modify in "I ate pizza with {pepperoni, a fork, gusto, George}"?It is simple to show that there is a type fo pizza that is a pepperoni pizza, but not a fork pizza etc. The others all have different roles that are recognizable by the word type they have.This would create frames similar
to:ate(Person, pepperoni pizza)ate(Person, pizza, with Utensil) ate(Person, pizza, with Feeling)ate(Person, pizza, with Person) So the eat action would show the different type of modifiers it would expect, and when it saw something different it would try to fit it into one of the expected slots, or a new slow/frame definition would need to be created. What does "they" refer to in (from Lenat) "The police arrested the demonstrators because they {feared, advocated} violence"?This one is harder, but...Statistically, on the first pass, "police feared violence" has 62 instances, and "demonstrators feared violence" has 0Then we can expand the "violence" term to attacks, riots we seepolice: 50+40demonstrators: 0 So we have overwhelming evidence there for police fearing it. Grammatically we assume the closest match which is demonstrators, so those have to be reconciled together to come up with police.Is the following sentence correct: "The cat caught a moose"?This can acutally be handled fairly well. looking at a frame of cat, and moose, we can statistically see that it is a rare if not non-existent event that a cat can catch a moose. Now in theory this could be a sci-fi book where a huge cat did catch the moose, but that would have to be learned with more context
information.A frame for Cat catching would show about 15% mouse5% rats5% bird3% othersA general statement can be made that "cats catch small animals" and that matches most item.It is mentioned once on the net, by an unreliable quotes page, that a "cat caught a moose"and once in a fairy tale (The Violet Fairy Book - The Nunda)a cat caught a donkey.But for general commons sense these type source would be too far from the norm and are

Re: Re: RE: [agi] Natural versus formal AI interface languages

2006-11-08 Thread Matt Mahoney

Ben Goertzel [EMAIL PROTECTED] wrote:
I am afraid that it may not be possible to find an initial project that is both

* small
* clearly a meaningfully large step along the path to AGI
* of significant practical benefit

I'm afraid you're right.  It is especially difficult because there is a long 
history of small (i.e narrow AI) projects that appear superficially to be 
meaningful steps toward AGI.  Sometimes it is decades before we discover that 
they don't scale.
 
-- Matt Mahoney, [EMAIL PROTECTED]


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-08 Thread Matt Mahoney

I think that natural language and the human genome have about the same order of 
magnitude complexity.

The genome is 6 x 10^9 bits (2 bits per base pair) uncompressed, but there is a 
lot of noncoding DNA and some redundancy.  By decoding, I assume you mean 
building a model and understanding the genome to the point where you could 
modify it and predict what will happen.

The complexity of natural language is probably 10^9 bits.  This is supported by:
- Turing's 1950 estimate, which he did not explain.
- Landauer's estimate of human long term memory capacity.
- The quantity of language processed by an average adult, times Shannon's 
estimate of the entropy of written English of 1 bit per character.
- Extrapolating the relationship between language model training set size and 
compression ratio in this graph: http://cs.fit.edu/~mmahoney/dissertation/

I don't think the encryption of the genome is any worse.  Complex systems (that 
have high Kolmogorov complexity, are incrementally updatable, and do useful 
computation) tend to converge to the boundary between stability and chaos, 
where some perturbations decay while others grow.  A characteristic of such 
systems (as studied by Kaufmann) is that the number of stable states or 
attractors tends to the square root of the size.  The number of human genes is 
about the same as the size of the human vocabulary, about 30,000.  Neither 
system is encrypted in the mathematical sense.  Encryption cannot be an 
emergent property because it is at the extreme chaotic end of the spectrum.  
Changing one bit of the key or plaintext affects every bit of the ciphertext.

The difference is that it is easier (faster and more ethical) to experiment 
with language models than the human genome.

 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Eliezer S. Yudkowsky [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 8, 2006 3:23:10 PM
Subject: Re: [agi] Natural versus formal AI interface languages

Eric Baum wrote:
 (Why should producing a human-level AI be cheaper than decoding the
 genome?)

Because the genome is encrypted even worse than natural language.

-- 
Eliezer S. Yudkowsky  http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The crux of the problem

2006-11-08 Thread Matt Mahoney

James,Many of the solutions you describe can use information gathered from statistical models, which are opaque. I need to elaborate on this, because I think opaque models will be fundamental to solving AGI. We need to build models in a way that doesn't require access to the internals. This requires a different approach than traditional knowledge representation. It will require black box testing and performance metrics. It will be less of an engineering approach, and more of an experimental one.Information retrieval is a good example. It is really simple. You type a question, and the system matches the words in your query to words in the document and ranks the documents by TF*IDF (term frequency times log inverse document frequency). This is an opaque
model. We normally build an index, but this is really just an optimization. The language model is just the documents themselves. There is no good theory to explain why it works. It just does.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, November 8, 2006 10:14:43 AMSubject: Re: [agi] The crux of the problemMatt: To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on. There is no neat, tidy algorithm that will generate all of this knowledge. You can't do any better than to just write down all of these facts. The data is not
compressable.James: You CAN actually, simply because there is patterns, anytime there are patterns, there is regularity, and the ability to compress things. And those things are limited, even if on a super-large scale. The problem with that is the irregular parts, which have to be handled, and the amount of bad data, which has to be handled.But a simple example isate a pepperoni pizza ate a tuna pizzaate a VEGAN SUPREME pizzaate a Mexican pizzaate a pineapple pizzaAnd
we can see right off, that these are different types of pizza topping, and we can compress that into a frame easilyFrame Pizza: can have Toppings: pepperoni, tuna, pineapple can be Type: vegan supreme, mexicanThis does take some work, and does require some good data, but can be done.We can take that further to gather probabilities, and confidences about the Pizza frame, such that we can determine that a pepperoni pizza is the most likely if a random pizza is ordered.This does not give a perfect collection of information, but alot can be garnered just from this. This does not solve the AI problem, but does give us a nice building block of Knowledge to start working with. This is a much preferred method than hand-coding each piece as Cyc has seen, and they are currently coding and using many algorithms now that take advantage of statistical NLP and google to assist and suggest answers, and check the
answers they have in place.There is a simple pattern between Nouns and Verbs as well that can be taken out and extracted with relative ease, and also between Adj and Nouns, and Adv and Verbs.Ex: The dog eats, barks, growls, sniffs, attacks, alerts.That gives us an initial store of information about a dog frame.Then if given Rover barked at the mailmen. we can programmatically narrow the possibilities about what Actor can fulfill the "bark" role, and see that dogs bark, and are most likely to bark at the mailman, and give a probability, and confidence.One problem I have with you task of text compression is the stricture that it retain exactly the same text, as opposed to exactly the same Information.For a computer science data transmission issue the first is important, but for an AI issue the latter is more important.The dog sniffed the shoes. and The dog smelled the shoes. Is so very close in meaning as to be
acceptable representation of the event, and many things can be reduced to their component parts, or even use a more common synonym, or word root.And it much more important that the system would be able to answer the question What did the dog sniff/smell? as opposed to keeping the data exactly the same.As long as the answers come out the same, the internal representation could be in chinese or marks in the sand.James Ratcliff
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The crux of the problem

2006-11-10 Thread Matt Mahoney

James Ratcliff [EMAIL PROTECTED] wrote:Matt, expand upon the first part as you said there please.I argued earlier that a natural language model has a complexity of about 10^9 bits. To be precise, let p(s) be a function that outputs an estimate of the probability that string s will appear as a prefix in human discourse, such as might occur in a Turing test between a judge and human confederate. If p(s) is a good estimate of the true probability for most s, then this model could be used to pass the Turing test as follows: if Q is the dialog so far, the the machine will respond with answer A by selecting randomly from the distribution p(A|Q) = p(QA)/p(Q). I argue that the Kolmogorov complexity of a function p() which is sufficiently accurate to pass
the Turing test is about 10^9 bits.My argument that a language model must be opaque is based on the premise that the human brain cannot understand itself, for the same reason that a Turing machine cannot simulate another Turing machine with greater Kolmogorov complexity. This is not to say we can't build a brain. There are simple learning algorithms that can store vast knowledge. We can understand enough of the brain in order to describe its development, to write an algorithm for the learning mechanism and simulate its behavior. But we cannot know all of the knowledge it has learned. So we will be able to build an AGI and train it, but after we train it we cannot know everything that it knows. A transparent representation that implies otherwise is not possible.Most AGI designs have the form of a data structure to represent knowledge, and functions to convert input to knowledge and knowledge to output: input
-- knowledge representation -- outputMany knowledge representations have been proposed: frame-slot, first order logic, connnectionist systems, etc. These generally have the form of labeled graphs, where the vertices generally correspond to words, concepts, or system states, and the edges correspond to relations such as "is-a" or "contains", implications, probabilities, confidences, etc. We argue for the correctness of these models by showing how facts such as "the cat ate a mouse" can be easily represented, and give many examples.Here is the problem. We know that the knowledge representation must have a complexity of 10^9 bits. Anything smaller cannot work. When we give examples, we usually draw graphs with just a few edges per vertex, but this is not how it will look when training is complete. Suppose there are 10^5 vertices, enough to represent a large vocabulary. Then your trained system must have about
10^4 edges per vertex. Building such a model by hand, or even trying to understand or debug it would be hopeless. I would call such a model opaque.It is natural for us to seek simple solutions, a "theory of everything". After all, we are agents in the sense of Hutter's AIXI following the provably optimal strategy of Occam's Razor. But in our drive to simplify and understand, we are trying to compress the language model to an impossibly small size, always misled down a dead end path by our initial successes with low complexity toy systems.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Friday, November 10, 2006 9:56:00 AMSubject: Re: [agi] The crux of the
problemMatt, expand upon the first part as you said there please.JamesMatt Mahoney [EMAIL PROTECTED] wrote: James,Many of the solutions you describe can use information gathered from statistical models, which are opaque. I need to elaborate on this, because I think opaque models will be fundamental to solving AGI. We need to build models in a way that doesn't require access to the internals. This requires a different approach than traditional knowledge representation. It will require black box testing and performance metrics. It will be less of an engineering approach, and more of an experimental one.Information retrieval is a good
example. It is really simple. You type a question, and the system matches the words in your query to words in the document and ranks the documents by TF*IDF (term frequency times log inverse document frequency). This is an opaque model. We normally build an index, but this is really just an optimization. The language model is just the documents themselves. There is no good theory to explain why it works. It just does.-- Matt Mahoney, [EMAIL PROTECTED]
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-10 Thread Matt Mahoney

The security of Enigma depended on the secrecy of the algorithm in addition to 
the key.  This violated Kirchoff's principle, the requirement that a system be 
secure against an adversary who has everything except the key.  This mistake 
has been repeated many times by amateur cryptographers who thought that keeping 
the algorithm secret improved security.  Such systems are invariably broken.  
Secure systems are built by publishing the algorithm so that people can try to 
break them before they are used for anything important.  It has to be done this 
way because there is no provably secure system (regardless of whether P = NP), 
except the one time pad, which is impractical because it lacks message 
integrity, and the key has to be as large as the plaintext and can't be reused.

Anyway, my point is that decoding the human genome or natural language is not 
as hard as breaking encryption.  It cannot be because these systems are 
incrementally updatable, unlike ciphers.  This allows you to use search 
strategies that run in polynomial time.  A key search requires exponential 
time, or else the cipher is broken.  Modeling language or the genome in O(n) 
time or even O(n^2) time with n = 10^9 is much faster than brute force 
cryptanalysis in O(2^n) time with n = 128.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Eric Baum [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, November 9, 2006 12:18:34 PM
Subject: Re: [agi] Natural versus formal AI interface languages

Eric Baum [EMAIL PROTECTED] wrote:
Matt wrote:
Changing one bit of the key or plaintext affects every bit of the cipherte=
xt.

That is simply not true of most encryptions. For example, Enigma.=20

Matt:
Enigma is laughably weak compared to modern encryption, such as AES, RSA, S=
HA-256, ECC, etc.  Enigma was broken with primitive mechanical computers an=
d pencil and paper.

Enigma was broken without modern computers, *given access to the
machine.* I chose Enigma as an example, because to break language it
may be necessary to pay attention to the machine-- namely examining 
the genomics. But that is more work than you envisage ;^)

It is true that much modern encryption is based on simple algorithms.
However, some crypto-experts would advise more primitive approaches.
RSA is not known to be hard, even if P!=NP, someone may find a
number-theoretic trick tomorrow that factors. (Or maybe they already
have it, and choose not to publish).
If you use a mess machine like a modern version of enigma, that is
much less likely to get broken, even though you may not have the 
theoretical results.

Your response admits that for stream ciphers changing a bit of the
plaintext doesn't affect many bits of the ciphertext, which was what I
was mainly responding to. You may prefer other kinds of cipher, but 
your arguments about chaos are clearly not germane to concluding
language is easy to decode.

Incidentally, while no encryption scheme is provably hard to break
(even assuming P!=NP) more is known about grammars: they are provably
hard to decode given P!=NP.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] One grammar parser URL

2006-11-12 Thread Matt Mahoney

 http://josie.stanford.edu:8080/parser/Fails the Turing test :-) "I ate pizza with {pepperoni|George|chopsticks}" all have the same parse.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: James Ratcliff [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Sunday, November 12, 2006 1:11:32 PMSubject: [agi] One grammar parser URLDuring the grammar NLP discussion, someone asked about various parsers,   well here is one that I am looking at now   
 Download  http://nlp.stanford.edu/software/lex-parser.shtmlStanfords parser, and a online version is here  http://josie.stanford.edu:8080/parser/James  ___James Ratcliff - http://falazar.comNew Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php 


Everyone is raving about the all-new Yahoo! Mail beta.
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Natural versus formal AI interface languages

2006-11-12 Thread Matt Mahoney

Eric, can you give an example of a one way function (such as a cryptographic 
hash or cipher) produced by evolution or by a genetic algorithm?  A one-way 
function f has the property that y = f(x) is easy to compute, but it is hard to 
find x given f and y.  Other examples might be modular exponentiation in large 
finite groups, or multiplication of prime numbers with thousands of digits.

By incrementally updatable, I mean that you can make a small change to a 
system and the result will be a small change in behavior.  For example, most 
DNA mutations have a small effect.  We try to design software systems with this 
property so we can modify them without breaking them.  However, as the system 
gets bigger, there is more interaction between components, until it reaches the 
point where every change introduces more bugs than it fixes and the code 
becomes unmaintainable.  This is what happens when the system crosses the 
boundary from stability to chaotic.  My argument for Kauffman's observation 
that complex systems sit on this boundary is that stable systems are less 
useful, but chaotic systems can't be developed as a long sequence of small 
steps.  We are able to produce cryptosystems only because they are relatively 
simple, and even then it is hard.

I don't dispute that learning some simple grammars is NP-hard.  However, I 
don't believe that natural language is one of these grammars.  It certainly is 
not simple.  The human brain is less powerful than a Turing machine, so it 
has no special ability to solve NP-hard problems.  The fact that humans can 
learn natural language is proof enough that it can be done.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Eric Baum [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, November 12, 2006 9:29:13 AM
Subject: Re: [agi] Natural versus formal AI interface languages


Matt wrote:
Anyway, my point is that decoding the human genome or natural language is n=
ot as hard as breaking encryption.  It cannot be because these systems are =
incrementally updatable, unlike ciphers.  This allows you to use search str=
ategies that run in polynomial time.  A key search requires exponential tim=
e, or else the cipher is broken.  Modeling language or the genome in O(n) t=
ime or even O(n^2) time with n =3D 10^9 is much faster than brute force cry=
ptanalysis in O(2^n) time with n =3D 128.

I don't know what you mean by incrementally updateable,
but if you look up the literature on language learning, you will find
that learning various sorts of relatively simple grammars from
examples, or even if memory serves examples and queries, is NP-hard.
Try looking for Dana Angluin's papers back in the 80's.

If your claim is that evolution can not produce a 1-way function,
that's crazy.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] A question on the symbol-system hypothesis

2006-11-13 Thread Matt Mahoney

James Ratcliff [EMAIL PROTECTED] wrote:Well, words and language based ideas/terms adequatly describe much of the upper levels of human interaction and see appropriate in that case.It fails of course when it devolpes down to the physical level, ie vision or motor cortex skills, but other than that, using language internaly would seem natural, and be much easier to look inside the box ,and see what is going on and correct thesystem's behaviour.No, no, no, that is why AI failed. You can't look inside the box because it's 10^9 bits. Models that are simple enough to debug are too simple to
scale. How many times will we repeat this mistake? The contents of a knowledge base for AGI will be beyond our ability to comprehend. Get over it. It will require a different approach.1. Develop a quantifiable criteria for success, a test score.2. Develop a theory of learning.3. Develop a training and test set (about 10^9 bits compressed).4. Tune the learning model to improve the score.Example:1. Criteria: SAT analogy test score.2. Theory: word associtation matrix reduced by singular value decomposition (SVD).3. Data: 50M word corpus of news articles.4. Results: http://iit-iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/NRC-48255.pdfAn SVD factored word association matrix seems pretty opaque to me. You can't point to which matrix elements represent associations like cat-dog,
moon-star, etc, nor will you be inserting such knowledge for testing. If you want to understand it, you have to look at the learning algorithm. It turns out that there is an efficient neural model for SVD. http://gen.gorrellville.com/gorrell06.pdfIt should not take decades to develop a knowledge base like Cyc. Statistical approaches can do this in a matter of minutes or hours.-- Matt Mahoney, [EMAIL PROTECTED]
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-14 Thread Matt Mahoney

I will try to answer several posts here.  I said that the knowledge base of an 
AGI must be opaque because it has 10^9 bits of information, which is more than 
a person can comprehend.  By opaque, I mean that you can't do any better by 
examining or modifying the internal representation than you could by examining 
or modifying the training data.  For a text based AI with natural language 
ability, the 10^9 bits of training data would be about a gigabyte of text, 
about 1000 books.  Of course you can sample it, add to it, edit it, search it, 
run various tests on it, and so on.  What you can't do is read, write, or know 
all of it.  There is no internal representation that you could convert it to 
that would allow you to do these things, because you still have 10^9 bits of 
information.  It is a limitation of the human brain that it can't store more 
information than this.

It doesn't matter if you agree with the number 10^9 or not.  Whatever the 
number, either the AGI stores less information than the brain, in which case it 
is not AGI, or it stores more, in which case you can't know everything it does.


Mark Waser wrote:

I certainly don't buy the mystical approach that says that  sufficiently 
large neural nets will come up with sufficiently complex  discoveries that we 
can't understand them.



James Ratcliff wrote:

Having looked at the nueral network type AI algorithms, I dont see any 
fathomable way that that type of architecture could 
create a full AGI by itself.



Nobody has created an AGI yet.  Currently the only working model of 
intelligence we have is based on neural networks.  Just because we can't 
understand it doesn't mean it is wrong.

James Ratcliff wrote:

Also it is a critical task for expert systems to explain why they are
doing what they are doing, and for business application, 
I for one am
not goign to blindy trust what the AI says, without a little background.

I expect this ability to be part of a natural language model.  However, any 
explanation will be based on the language model, not the internal workings of 
the knowledge representation.  That remains opaque.  For example:

Q: Why did you turn left here?
A: Because I need gas.

There is no need to explain that there is an opening in the traffic, that you 
can see a place where you can turn left without going off the road, that the 
gas gauge reads E, and that you learned that turning the steering wheel 
counterclockwise makes the car turn left, even though all of this is part of 
the thought process.  The language model is responsible for knowing that you 
already know this.  There is no need either (or even the ability) to explain 
the sequence of neuron firings from your eyes to your arm muscles.

and this is one of the requirements for the Project Halo contest (took and 
passed the AP chemistry exam)
http://www.projecthalo.com/halotempl.asp?cid=30

This is a perfect example of why a transparent KR does not scale.  The expert 
system described was coded from 70 pages of a chemistry textbook in 28 
person-months.  Assuming 1K bits per page, this is a rate of 4 minutes per bit, 
or 2500 times slower than transmitting the same knowledge as natural language.

Mark Waser wrote:
   Given sufficient time, anything  should be able to be understood and 
 debugged.
...
 Give me *one* counter-example to  the above . . . . 


Google.  You cannot predict the results of a search.  It does not help that you 
have full access to the Internet.  It would not help even if Google gave you 
full access to their server.

When we build AGI, we will understand it the way we understand Google.  We know 
how a search engine works.  We will understand how learning works.  But we will 
not be able to predict or control what we build, even if we poke inside.

-- Matt Mahoney, [EMAIL PROTECTED]





-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] One grammar parser URL

2006-11-15 Thread Matt Mahoney

1. No can do.  The algorithmic complexity of parsing natural language as well 
as an average adult human is around 10^9 bits.  There is no small grammar for 
English.

2. You need semantics to parse natural language.  This is part of what makes it 
hard.  Or do you want a parser that gives you wrong answers?  I can do that.

3. If translating natural language to a structured representation is not hard, 
then do it.  People have been working on this for 50 years without success.  
Doing logical inference is the easy part.

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: YKY (Yan King Yin) [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 8:59:45 AM
Subject: Re: [agi] One grammar parser URL

 

Several things:

 

1.  Someone suggested these parsers to me:

 

Eugene Charniak's

http://www.cog.brown.edu/Research/nlp/resources.html


Dan Bikel's
http://www.cis.upenn.edu/~dbikel/software.html


 

Demos for both are at:

http://lfg-demo.computing.dcu.ie/lfgparser.html
 


It seems that they are similar in function to the Stanford parser.  I'd prefer 
smaller grammars and parsers with smaller memory footprints.

 

 

2.  I ate pizza with {pepperoni|George|chopsticks} yielding the same parse 
should be expected.  The difference of those sentences is in semantics, and the 
word with is overloaded with several meanings.  The parser is only 
responsible for syntactic aspects.


 

 

3.  Translating English sentences to Geniform or some other logical form may 
not be that hard, but after the translation we have to store the facts in a 
generic memory and use them for inference.  For those, we need a canonical 
form, to organize the facts via clustering, and to keep track of what facts 
support other facts.  All these are big problems.  I'm looking for someone to 
do the translating so I can work on inference and generic memory.  It is easier 
for one person to focus on one task, such as translation, for several formats.  
Another can focus on inference for several formats, etc.  Then we can help each 
other while still exploring different ideas.


 

YKY


This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney

Richard Loosemore [EMAIL PROTECTED] wrote:
Understanding 10^9 bits of information is not the same as storing 10^9 
bits of information.

That is true.  Understanding n bits is the same as compressing some larger 
training set that has an algorithmic complexity of n bits.  Once you have done 
this, you can use your probability model to make predictions about unseen data 
generated by the same (unknown) Turing machine as the training data.  The 
closer to n you can compress, the better your predictions will be.

I am not sure what it means to understand a painting, but let's say that you 
understand art if you can identify the artists of paintings you haven't seen 
before with better accuracy than random guessing.  The relevant quantity of 
information is not the number of pixels and resolution, which depend on the 
limits of the eye, but the (much smaller) number of features that the high 
level perceptual centers of the brain are capable of distinguishing and storing 
in memory.  (Experiments by Standing and Landauer suggest it is a few bits per 
second for long term memory, the same rate as language).  Then you guess the 
shortest program that generates a list of feature-artist pairs consistent with 
your knowledge of art and use it to predict artists given new features.

My estimate of 10^9 bits for a language model is based on 4 lines of evidence, 
one of which is the amount of language you process in a lifetime.  This is a 
rough estimate of course.  I estimate 1 GB (8 x 10^9 bits) compressed to 1 bpc 
(Shannon) and assume you remember a significant fraction of that.

Landauer, Tom (1986), “How much do people
remember?  Some estimates of the quantity
of learned information in long term memory”, Cognitive Science (10) pp. 477-493

Shannon, Cluade E. (1950), “Prediction and
Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64.  

Standing, L. (1973), “Learning 10,000 Pictures”,
Quarterly Journal of Experimental Psychology (25) pp. 207-222.



-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 9:33:04 AM
Subject: Re: [agi] A question on the symbol-system hypothesis

Matt Mahoney wrote:
 I will try to answer several posts here. I said that the knowledge
 base of an AGI must be opaque because it has 10^9 bits of information,
 which is more than a person can comprehend. By opaque, I mean that you
 can't do any better by examining or modifying the internal
 representation than you could by examining or modifying the training
 data. For a text based AI with natural language ability, the 10^9 bits
 of training data would be about a gigabyte of text, about 1000 books. Of
 course you can sample it, add to it, edit it, search it, run various
 tests on it, and so on. What you can't do is read, write, or know all of
 it. There is no internal representation that you could convert it to
 that would allow you to do these things, because you still have 10^9
 bits of information. It is a limitation of the human brain that it can't
 store more information than this.

Understanding 10^9 bits of information is not the same as storing 10^9 
bits of information.

A typical painting in the Louvre might be 1 meter on a side.  At roughly 
16 pixels per millimeter, and a perceivable color depth of about 20 bits 
that would be about 10^8 bits.  If an art specialist knew all about, 
say, 1000 paintings in the Louvre, that specialist would understand a 
total of about 10^11 bits.

You might be inclined to say that not all of those bits count, that many 
are redundant to understanding.

Exactly.

People can easily comprehend 10^9 bits.  It makes no sense to argue 
about degree of comprehension by quoting numbers of bits.


Richard Loosemore

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney

Sorry if I did not make clear the distinction between knowing the learning 
algorithm for AGI (which we can do) and knowing what was learned (which we 
can't).

My point about Google is to illustrate that distinction.  The Google database 
is about 10^14 bits.  (It keeps a copy of the searchable part of the Internet 
in RAM).  The algorithm is deterministic.  You could, in principle, model the 
Google server in a more powerful machine and use it to predict the result of a 
search.  But where does this get you?  You can't predict the result of the 
simulation any more than you could predict the result of the query you are 
simulating.  In practice the human brain has finite limits just like any other 
computer.

My point about AGI is that constructing an internal representation that allows 
debugging the learned knowledge is pointless.  A more powerful AGI could do it, 
but you can't.  You can't do any better than to manipulate the input and 
observe the output.  If you tell your robot to do something and it sits in a 
corner instead, you can't do any better than to ask it why, hope for a sensible 
answer, and retrain it.  Trying to debug the reasoning for its behavior would 
be like trying to understand why a driver made a left turn by examining the 
neural firing patterns in the driver's brain.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Mark Waser [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 9:39:14 AM
Subject: Re: [agi] A question on the symbol-system hypothesis

Mark Waser wrote:
   Given sufficient time, anything  should be able to be understood and 
 debugged.
 Give me *one* counter-example to  the above . . . .
Matt Mahoney replied:
 Google.  You cannot predict the results of a search.  It does not help 
 that you have full access to the Internet.  It would not help even if 
 Google gave you full access to their server.

This is simply not correct.  Google uses a single non-random algorithm 
against a database to determine what results it returns.  As long as you 
don't update the database, the same query will return the exact same results 
and, with knowledge of the algorithm, looking at the database manually will 
also return the exact same results.

Full access to the Internet is a red herring.  Access to Google's database 
at the time of the query will give the exact precise answer.  This is also, 
exactly analogous to an AGI since access to the AGI's internal state will 
explain the AGI's decision (with appropriate caveats for systems that 
deliberately introduce randomness -- i.e. when the probability is 60/40, the 
AGI flips a weighted coin -- but in even those cases, the answer will still 
be of the form that the AGI ended up with a 60% probability of X and 40% 
probability of Y and the weighted coin landed on the 40% side).

 When we build AGI, we will understand it the way we understand Google. 
 We know how a search engine works.  We will understand how learning 
 works.  But we will not be able to predict or control what we build, even 
 if we poke inside.

I agree with your first three statements but again, the fourth is simply not 
correct (as well as a blatant invitation to UFAI).  Google currently 
exercises numerous forms of control over their search engine.  It is known 
that they do successfully exclude sites (for visibly trying to game 
PageRank, etc.).  They constantly tweak their algorithms to change/improve 
the behavior and results.  Note also that there is a huge difference between 
saying that something is/can be exactly controlled (or able to be exactly 
predicted without knowing it's exact internal state) and that something's 
behavior is bounded (i.e. that you can be sure that something *won't* 
happen -- like all of the air in a room suddenly deciding to occupy only 
half the room).  No complex and immense system is precisely controlled but 
many complex and immense systems are easily bounded.

- Original Message - 
From: Matt Mahoney [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, November 14, 2006 10:34 PM
Subject: Re: [agi] A question on the symbol-system hypothesis


I will try to answer several posts here.  I said that the knowledge base of 
an AGI must be opaque because it has 10^9 bits of information, which is more 
than a person can comprehend.  By opaque, I mean that you can't do any 
better by examining or modifying the internal representation than you could 
by examining or modifying the training data.  For a text based AI with 
natural language ability, the 10^9 bits of training data would be about a 
gigabyte of text, about 1000 books.  Of course you can sample it, add to it, 
edit it, search it, run various tests on it, and so on.  What you can't do 
is read, write, or know all of it.  There is no internal representation that 
you could convert it to that would allow you to do these things, because you 
still have 10^9 bits of information.  It is a limitation of the human brain

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney

Richard, what is your definition of understanding?  How would you test 
whether a person understands art?

Turing offered a behavioral test for intelligence.  My understanding of 
understanding is that it is something that requires intelligence.  The 
connection between intelligence and compression is not obvious.  I have 
summarized the arguments here.
http://cs.fit.edu/~mmahoney/compression/rationale.html
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 2:38:49 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

Matt Mahoney wrote:
 Richard Loosemore [EMAIL PROTECTED] wrote:
 Understanding 10^9 bits of information is not the same as storing 10^9 
 bits of information.
 
 That is true.  Understanding n bits is the same as compressing some larger 
 training set that has an algorithmic complexity of n bits.  Once you have 
 done this, you can use your probability model to make predictions about 
 unseen data generated by the same (unknown) Turing machine as the training 
 data.  The closer to n you can compress, the better your predictions will be.
 
 I am not sure what it means to understand a painting, but let's say that 
 you understand art if you can identify the artists of paintings you haven't 
 seen before with better accuracy than random guessing.  The relevant quantity 
 of information is not the number of pixels and resolution, which depend on 
 the limits of the eye, but the (much smaller) number of features that the 
 high level perceptual centers of the brain are capable of distinguishing and 
 storing in memory.  (Experiments by Standing and Landauer suggest it is a few 
 bits per second for long term memory, the same rate as language).  Then you 
 guess the shortest program that generates a list of feature-artist pairs 
 consistent with your knowledge of art and use it to predict artists given new 
 features.
 
 My estimate of 10^9 bits for a language model is based on 4 lines of 
 evidence, one of which is the amount of language you process in a lifetime.  
 This is a rough estimate of course.  I estimate 1 GB (8 x 10^9 bits) 
 compressed to 1 bpc (Shannon) and assume you remember a significant fraction 
 of that.

Matt,

So long as you keep redefining understand to mean whatever something 
trivial (or at least, something different in different circumstances), 
all you do is reinforce the point I was trying to make.

In your definition of understanding in the context of art, above, you 
specifically choose an interpretation that enables you to pick a 
particular bit rate.  But if I chose a different interpretation (and I 
certainly would - an art historian would never say they understood a 
painting just because they could tell the artist's style better than a 
random guess!), I might come up with a different bit rate.  And if I 
chose a sufficiently subtle concept of understand, I would be unable 
to come up with *any* bit rate, because that concept of understand 
would not lend itself to any easy bit rate analysis.

The lesson?  Talking about bits and bit rates is completely pointless 
 which was my point.

You mainly identify the meaning of understand as a variant of the 
meaning of compress.  I completely reject this - this is the most 
idiotic development in AI research since the early attempts to do 
natural language translation using word-by-word lookup tables  -  and I 
challenge you to say why anyone could justify reducing the term in such 
an extreme way.  Why have you thrown out the real meaning of 
understand and substituted another meaning?  What have we gained by 
dumbing the concept down?

As I said in previously, this is as crazy as redefining the complex 
concept of happiness to be a warm puppy.


Richard Loosemore



 Landauer, Tom (1986), “How much do people
 remember?  Some estimates of the quantity
 of learned information in long term memory”, Cognitive Science (10) pp. 
 477-493
 
 Shannon, Cluade E. (1950), “Prediction and
 Entropy of Printed English”, Bell Sys. Tech. J (3) p. 50-64.  
 
 Standing, L. (1973), “Learning 10,000 Pictures”,
 Quarterly Journal of Experimental Psychology (25) pp. 207-222.
 
 
 
 -- Matt Mahoney, [EMAIL PROTECTED]
 
 - Original Message 
 From: Richard Loosemore [EMAIL PROTECTED]
 To: agi@v2.listbox.com
 Sent: Wednesday, November 15, 2006 9:33:04 AM
 Subject: Re: [agi] A question on the symbol-system hypothesis
 
 Matt Mahoney wrote:
 I will try to answer several posts here. I said that the knowledge
 base of an AGI must be opaque because it has 10^9 bits of information,
 which is more than a person can comprehend. By opaque, I mean that you
 can't do any better by examining or modifying the internal
 representation than you could by examining or modifying the training
 data. For a text based AI with natural language ability, the 10^9 bits
 of training data would be about a gigabyte of text, about 1000

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney


Mark Waser wrote:
Are you conceding that you can predict the results of a Google 
search?


OK, you are right.  You can type the same query twice.  Or if you live long 
enough you can do it the hard way.  But you won't.

Are you now conceding that it is not true that Models that are simple  enough 
to debug are too simple to scale.?


OK, you are right again.  Plain text is a simple way to represent knowledge.  I 
can search and edit terabytes of it.

But this is not the point I wanted to make.  I am sure I expressed it badly.  
The point is there are two parts to AGI, a learning algorithm and a knowledge 
base.  The learning algorithm has low complexity.  You can debug it, meaning 
you can examine the internals to test it and verify it is working the way you 
want.  The knowledge base has high complexity.  You can't debug it.  You can 
examine it and edit it but you can't verify its correctness.

An AGI with a correct learning algorithm might still behave badly.  You can't 
examine the knowledge base to find out why.  You can't manipulate the knowledge 
base data to fix it.  At least you can't do these things any better than 
manipulating the inputs and observing the outputs.  The reason is that the 
knowledge base is too complex.  In theory you could do these things if you 
lived long enough, but you won't.  For practical purposes, the AGI knowledge 
base is a black box.  You need to design your goals, learning algorithm, data 
set and test program with this in mind.  Trying to build transparency into the 
data structure would be pointless.  Information theory forbids it.  Opacity is 
not advantagous or desirable.  It is just unavoidable.

I am sure I won't convince you, so maybe you have a different explanation why 
50 years of building structured knowledge bases has not worked, and what you 
think can be done about it?

And Google DOES keep the searchable part of the Internet in memory
http://blog.topix.net/archives/11.html

because they have enough hardware to do it.
http://en.wikipedia.org/wiki/Supercomputer#Quasi-supercomputing

-- Matt Mahoney, [EMAIL PROTECTED]





-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney

1. The fact that AIXI^tl is intractable is not relevant to the proof that 
compression = intelligence, any more than the fact that AIXI is not computable. 
 In fact it is supporting because it says that both are hard problems, in 
agreement with observation.

2. Do not confuse the two compressions.  AIXI proves that the optimal behavior 
of a goal seeking agent is to guess the shortest program consistent with its 
interaction with the environment so far.  This is lossless compression.  A 
typical implementation is to perform some pattern recognition on the inputs to 
identify features that are useful for prediction.  We sometimes call this 
lossy compression because we are discarding irrelevant data.  If we 
anthropomorphise the agent, then we say that we are replacing the input with 
perceptually indistinguishable data, which is what we typically do when we 
compress video or sound.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Mark Waser [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 3:48:37 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

 The connection between intelligence and compression is not obvious.

The connection between intelligence and compression *is* obvious -- but 
compression, particularly lossless compression, is clearly *NOT* 
intelligence.

Intelligence compresses knowledge to ever simpler rules because that is an 
effective way of dealing with the world.  Discarding ineffective/unnecessary 
knowledge to make way for more effective/necessary knowledge is an effective 
way of dealing with the world.  Blindly maintaining *all* knowledge at 
tremendous costs is *not* an effective way of dealing with the world (i.e. 
it is *not* intelligent).

1. What Hutter proved is that the optimal behavior of an agent is to guess 
that the environment is controlled by the shortest program that is 
consistent with all of the interaction observed so far.  The problem of 
finding this program known as AIXI.
 2. The general problem is not computable [11], although Hutter proved 
 that if we assume time bounds t and space bounds l on the environment, 
 then this restricted problem, known as AIXItl, can be solved in O(t2l) 
 time

Very nice -- except that O(t2l) time is basically equivalent to incomputable 
for any real scenario.  Hutter's proof is useless because it relies upon the 
assumption that you have adequate resources (i.e. time) to calculate AIXI --  
which you *clearly* do not.  And like any other proof, once you invalidate 
the assumptions, the proof becomes equally invalid.  Except as an 
interesting but unobtainable edge case, why do you believe that Hutter has 
any relevance at all?


- Original Message - 
From: Matt Mahoney [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 2:54 PM
Subject: Re: [agi] A question on the symbol-system hypothesis


Richard, what is your definition of understanding?  How would you test 
whether a person understands art?

Turing offered a behavioral test for intelligence.  My understanding of 
understanding is that it is something that requires intelligence.  The 
connection between intelligence and compression is not obvious.  I have 
summarized the arguments here.
http://cs.fit.edu/~mmahoney/compression/rationale.html

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 2:38:49 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

Matt Mahoney wrote:
 Richard Loosemore [EMAIL PROTECTED] wrote:
 Understanding 10^9 bits of information is not the same as storing 10^9
 bits of information.

 That is true.  Understanding n bits is the same as compressing some 
 larger training set that has an algorithmic complexity of n bits.  Once 
 you have done this, you can use your probability model to make predictions 
 about unseen data generated by the same (unknown) Turing machine as the 
 training data.  The closer to n you can compress, the better your 
 predictions will be.

 I am not sure what it means to understand a painting, but let's say that 
 you understand art if you can identify the artists of paintings you 
 haven't seen before with better accuracy than random guessing.  The 
 relevant quantity of information is not the number of pixels and 
 resolution, which depend on the limits of the eye, but the (much smaller) 
 number of features that the high level perceptual centers of the brain are 
 capable of distinguishing and storing in memory.  (Experiments by Standing 
 and Landauer suggest it is a few bits per second for long term memory, the 
 same rate as language).  Then you guess the shortest program that 
 generates a list of feature-artist pairs consistent with your knowledge of 
 art and use it to predict artists given new features.

 My estimate of 10^9 bits for a language model is based on 4 lines of 
 evidence, one of which

Re: [agi] A question on the symbol-system hypothesis

2006-11-15 Thread Matt Mahoney

Richard Loosemore [EMAIL PROTECTED] wrote:
 5) I have looked at your paper and my feelings are exactly the same as 
 Mark's  theorems developed on erroneous assumptions are worthless.

Which assumptions are erroneous?
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Richard Loosemore [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 4:09:23 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

Matt Mahoney wrote:
 Richard, what is your definition of understanding?  How would you test 
 whether a person understands art?
 
 Turing offered a behavioral test for intelligence.  My understanding of 
 understanding is that it is something that requires intelligence.  The 
 connection between intelligence and compression is not obvious.  I have 
 summarized the arguments here.
 http://cs.fit.edu/~mmahoney/compression/rationale.html

1) There will probably never be a compact definition of understanding. 
  Nevertheless, it is possible for us (being understanding systems) to 
know some of its features.  I could produce a shopping list of typical 
features of understanding, but that would not be the same as a 
definition, so I will not.  See my paper in the forthcoming proceedings 
of the 2006 AGIRI workshop, for arguments.  (I will make a version of 
this available this week, after final revisions).

3) One tiny, almost-too-obvious-to-be-worth-stating fact about 
understanding is that it compresses information in order to do its job.

4) To mistake this tiny little facet of understanding for the whole is 
to say that a hurricane IS rotation, rather than that rotation is a 
facet of what a hurricane is.

5) I have looked at your paper and my feelings are exactly the same as 
Mark's  theorems developed on erroneous assumptions are worthless.



Richard Loosemore


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-16 Thread Matt Mahoney

Mark Waser [EMAIL PROTECTED]
wrote:
 
So *prove* to me why information  theory forbids transparency of a knowledge 
base.


Isn't this pointless?  I mean, if I offer any proof you will just attack the 
assumptions.  Without assumptions, you can't even prove the universe exists.

I have already stated reasons why I believe this is true.  An AGI will have 
greater algorithmic complexity than the human brain (assumption).  Transparency 
implies that you can examine the knowledge base and deterministically predict 
its output given some input (assumption about the definition of transparency).  
Legg proved [1] that a Turing machine cannot predict another machine of greater 
algorithmic complexity.

Aside from that, I can only give examples as supporting evidence.
1. The relative success of statistical language learning (opaque) compared to 
structured knowledge, parsing, etc.
2. It would be (presumably) easier to explain human behavior by asking 
questions than by examining neurons (assuming we had the technology to do this).

In your argument for transparency, you assume that individual pieces of 
knowledge can be isolated.  Prove it.  In the brain, knowledge is distributed.  
We make decisions by integrating many sources of evidence from all parts of the 
brain.

[1] Legg, Shane, (2006), Is There an Elegant Universal Theory of
Prediction?,  Technical Report
IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle
Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland.

http://www.vetta.org/documents/IDSIA-12-06-1.pdf



  

-- Matt Mahoney, [EMAIL PROTECTED]



- Original Message 

From: Mark Waser [EMAIL PROTECTED]

To: agi@v2.listbox.com

Sent: Thursday, November 16, 2006 9:57:40 AM

Subject: Re: [agi] A question on the symbol-system hypothesis



 The knowledge base has high complexity.   You can't debug it.  You 
can examine it and edit it but you can't verify  its correctness.

  

 While the knowledge base is complex, I disagree  with the way in which you're 
attempting to use the first sentence.   The knowledge base *isn't* so complex 
that it causes a truly insoluble  problem.  The true problem is that the 
knowledge base will have a large  enough size and will grow and change quickly 
enough that you can't maintain  100% control over the contents or even the 
integrity of it.

  

 I disagree with the second but believe that it may  just be your semantics 
because of the third sentence.  The question is what  we mean by debug.  If 
you mean remove all incorrect knowledge, then the  answer is obviously yes, we 
can't remove all incorrect knowledge because odd  sequences of observed events 
and incomplete knowledge means that globally  incorrect knowledge *is* the 
correct deduction from experience.  On the  other hand, we certainly should be 
able to debug how the knowledge base  operates, make sure that it maintains an 
acceptable degree of internal  integrity, and responds correctly when it 
detects a major integrity  problem.  The *process* and global behavior of the 
knowledge base is what  is important and it *can* be debugged.  Minor mistakes 
and errors are just  the cost of being limited in an erratic world.

  

  An AGI with a correct learning algorithm might  still behave badly.

  

 No!  An AGI with a correct learning algorithm  may, through an odd sequence of 
events and incomplete knowledge, come to an  incorrect conclusion and take an 
action that it would not have taken if it had  perfect knowledge -- BUT -- this 
is entirely correct behavior, not bad  behavior.  Calling it bad behavior 
dramatically obscures what you are  trying to do.

  

  You can't examine the knowledge base to find  out why. 

  

 No, no, no, no, NO!  If you (or the AI) can't  go back through the causal 
chain and explain exactly why an action was taken,  then you have created an 
unsafe AI.  A given action depends upon a small  part of the knowledge base 
(which may then depend upon ever larger sections in  an ongoing pyramid) and 
you can debug an action and see what lead to an action  (that you believe is 
incorrect but the AI believes is correct).

  

  You can't manipulate the knowledge base data  to fix it. 

  

 Bull.  You should be able to correctly come  across a piece of incorrect 
knowledge that lead to an incorrect decision.   You should be able to find the 
supporting knowledge structures.  If the  knowledge is truly incorrect, you 
should be able to provide evidence/experiences  to the AI that leads it to 
correct the incorrect knowledge (or, you could just  even just tack the correct 
knowledge in the knowledge base, fix it so that it  temporarily can't be 
altered, and run your integrity repair routines -- which, I  contend, any AI 
that is going to go anywhere must have).

  

  At least you can't do these things any better  than manipulating the inputs 
  and observing the outputs. 

  

 No.  I can find structures in the knowledge  base and alter them.  I would

Re: [agi] A question on the symbol-system hypothesis

2006-11-16 Thread Matt Mahoney

My point is that humans make decisions based on millions of facts, and we do 
this every second.  Every fact depends on other facts.  The chain of reasoning 
covers the entire knowledge base.

I said millions, but we really don't know.  This is an important number.  
Historically we have tended to underestimate it.  If the number is small, then 
we *can* follow the reasoning, make changes to the knowledge base and predict 
the outcome (provided the representation is transparent and accessible through 
a formal language).  But this leads us down a false path.

We are not so smart that we can build a machine smarter than us, and still be 
smarter than it.  Either the AGI has more algorithmic complexity than you do, 
or it has less.  If it has less, then you have failed.  If it has more, and you 
try to explore the chain of reasoning, you will exhaust the memory in your 
brain before you finish.

 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Mark Waser [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, November 16, 2006 3:16:54 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

I consider the last question in each of your examples to be unreasonable 
(though for very different reasons).

In the first case, What do you see? is a nonsensical and unnecessary 
extension on a rational chain of logic.  The visual subsystem, which is not 
part of the AGI, has reported something and, unless there is a good reason 
not to, the AGI should believe it as a valid fact and the root of a 
knowledge chain.  Extending past this point to ask a spurious, open question 
is silly.  Doing so is entirely unnecessary.  This knowledge chain is 
isolated.

In the second case, I don't know why you're doing any sort of search 
(particularly since there wasn't any sort of question preceding it).  The AI 
needed gas, it found a gas station, and it headed for it.  You asked why it 
waited til a given time and it told you.  How is this not isolated?

- Original Message - 
From: Matt Mahoney [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, November 16, 2006 3:01 PM
Subject: Re: [agi] A question on the symbol-system hypothesis


Mark Waser [EMAIL PROTECTED] wrote:
Give me a counter-example of knowledge that can't be isolated.

Q. Why did you turn left here?
A. Because I need gas.
Q. Why do you need gas?
A. Because the tank is almost empty.
Q. How do you know?
A. Because the needle is on E.
Q. How do you know?
A. Because I can see it.
Q. What do you see?
(depth first search)

Q. Why did you turn left here?
A. Because I need gas.
Q. Why did you turn left *here*?
A. Because there is a gas station.
Q. Why did you turn left now?
A. Because there is an opening in the traffic.
(breadth first search)

It's not that we can't do it in theory.  It's that we can't do it in 
practice.  The human brain is not a Turing machine.  It has finite time and 
memory limits.

-- Matt Mahoney, [EMAIL PROTECTED]



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-16 Thread Matt Mahoney

Again, do not confuse the two compressions.

In paq8f (on which paq8hp5 is based) I use lossy pattern recognition (like you 
describe, but at a lower level) to extract features to use as context for text 
prediction.  The lossless compression is used to evaluate the quality of the 
prediction.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: James Ratcliff [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, November 16, 2006 1:41:41 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

The main first subtitle: 
Compression is Equivalent to General IntelligenceUnless your definition of 
Compression is not the simple large amount of text turning into the small 
amount of text.
And likewise with General Intelligence.
I dont think under any of the many many definitions I have seen or created, 
that text or a compress thing can possibly be considered general 
intelligence.
Another way: data != knowledge != intelligence

Intelligence requires something else.  I would say an actor.

Now I would agree that a highly compressed, lossless data could represent a 
good knowledge base.  Yeah that goes good.

But quite simply, a lossy one provides a Better knowledge base, with two 
examples:
1. Poison ivy causes an itching rash for most people
poison oak: The common effect is an irritating, itchy rash.
Can be generalized or combined to:
poison
 oak and poison ivy cause an itchy rash.
Which is shorter, and lossy yet better for this fact.
2. If I see something in the road with four legs, and Im about to run it over, 
if I only have rules that say if a deer or dog runs in the road, dont hit it.
Then I cant correctly act, because I only know there is something with 4 legs 
in the road.  
However, if I have a generalized rule in my mind that says 
If something with four legs is in the road, avoid it, then I have a better 
rule.
This better rule cannot be gathered without generalization, and we have to have 
lots of generalization.

The generalizations can be invalidated with exceptions, and we do it all the 
time, thats how we can tell not to pet a skunk instead of a cat.

James Ratcliff


Matt Mahoney [EMAIL PROTECTED] wrote: Richard
 Loosemore  wrote:
 5) I have looked at your paper and my feelings are exactly the same as 
 Mark's  theorems developed on erroneous assumptions are worthless.

Which assumptions are erroneous?
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Richard Loosemore 
To: agi@v2.listbox.com
Sent: Wednesday, November 15, 2006 4:09:23 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

Matt Mahoney wrote:
 Richard, what is your definition of understanding?  How would you test 
 whether a person understands art?
 
 Turing offered a behavioral test for intelligence.  My understanding of 
 understanding is that it is something that requires intelligence.  The 
 connection between intelligence and compression is not obvious.  I have 
 summarized the arguments here.
 http://cs.fit.edu/~mmahoney/compression/rationale.html

1)
 There will probably never be a compact definition of understanding. 
  Nevertheless, it is possible for us (being understanding systems) to 
know some of its features.  I could produce a shopping list of typical 
features of understanding, but that would not be the same as a 
definition, so I will not.  See my paper in the forthcoming proceedings 
of the 2006 AGIRI workshop, for arguments.  (I will make a version of 
this available this week, after final revisions).

3) One tiny, almost-too-obvious-to-be-worth-stating fact about 
understanding is that it compresses information in order to do its job.

4) To mistake this tiny little facet of understanding for the whole is 
to say that a hurricane IS rotation, rather than that rotation is a 
facet of what a hurricane is.

5) I have looked at your paper and my feelings are exactly the same as 
Mark's  theorems developed on erroneous assumptions are
 worthless.



Richard Loosemore


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



___
James Ratcliff - http://falazar.com
New Torrent Site, Has TV and Movie Downloads! 
http://www.falazar.com/projects/Torrents/tvtorrents_show.php 

Sponsored Link

   
Mortgage rates as low as 4.625% - $150,000 loan for $579 a month. Intro-*Terms

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] RSI - What is it and how fast?

2006-11-16 Thread Matt Mahoney

I think this is a topic for the singularity list, but I agree it could happen 
very quickly.  Right now there is more than enough computing power on the 
Internet to support superhuman AGI.  One possibility is that it could take the 
form of a worm.

http://en.wikipedia.org/wiki/SQL_slammer_(computer_worm)

An AGI of this type would be far more dangerous because it could analyze code, 
discover large numbers of vulnerabilities and exploit them all at once.  As the 
Internet gets bigger, faster, and more complex, the risk increases.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Hank Conn [EMAIL PROTECTED]
To: agi agi@v2.listbox.com
Sent: Thursday, November 16, 2006 3:33:08 PM
Subject: [agi] RSI - What is it and how fast?

Here are some of my attempts at explaining RSI...

 

(1)

As a given instance of intelligence, as defined as an algorithm of an agent 
capable of achieving complex goals in complex environments, approaches the 
theoretical limits of efficiency for this class of algorithms, intelligence 
approaches infinity. Since increasing computational resources available for an 
algorithm is a complex goal in a complex environment, the more intelligent an 
instance of intelligence becomes, the more capable it is in increasing the 
computational resources for the algorithm, as well as more capable in 
optimizing the algorithm for maximum efficiency, thus increasing its 
intelligence in a positive feedback loop.


 

(2)

Suppose an instance of a mind has direct access to some means of both improving 
and expanding both the hardware and software capability of its particular 
implementation. Suppose also that the goal system of this mind elicits a strong 
goal that directs its behavior to aggressively take advantage of these means. 
Given each increase in capability of the mind's implementation, it could (1) 
increase the speed at which its hardware is upgraded and expanded, (2) More 
quickly, cleverly, and elegantly optimize its existing software base to 
maximize capability, (3) Develop better cognitive tools and functions more 
quickly and in more quantity, and (4) Optimize its implementation on 
successively lower levels by researching and developing better, smaller, more 
advanced hardware. This would create a positive feedback loop- the more capable 
its implementation, the more capable it is in improving its implementation.


 


How fast could RSI plausibly happen? Is RSI inevitable / how soon will it be? 
How do we truly maximize the benefit to humanity?


 

It is my opinion that this could happen extremely quickly once a completely 
functional AGI is achieved. I think its plausible it could happen against the 
will of the designers (and go on to pose an existential risk), and quite likely 
that it would move along quite well with the designers intention, however, this 
opens up the door to existential disasters in the form of so-called Failures of 
Friendliness. I think its fairly implausible the designers would suppress this 
process, except those that are concerned about completely working out issues of 
Friendliness in the AGI design.



This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] One grammar parser URL

2006-11-16 Thread Matt Mahoney


YKY (Yan King Yin) [EMAIL PROTECTED]
wrote:
 Any suggestions on how to make my project more popular?


Clearly state the problem you want to solve.  Don't just build AGI for the sake 
of building it.

  Do you think it is good practice to attach frames to *words*, or rather to 
 *situations*?


Neither.  I think logical inference should be built into the language model.  
The hard part is not the inference (if you keep the chain of reasoning short) 
but in converting natural language statements about facts and relations and 
queries into mathematical form and back.  But I don't think this is even 
necessary.  Most people can reason informally without converting statements 
into first order logic.  A language model should develop this capability first.

Learning logic is similar to learning grammar.  A statistical model can 
classify words into syntactic categories by context, e.g. the X is tells you 
that X is a noun, and that it can be used in novel contexts where other nouns 
have been observed, like a X was.  At a somewhat higher level, you can teach 
logical inference by giving examples such as:

All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All birds have wings.  Tweety is a bird.  Therefore Tweety has wings.

which fit a pattern allowing you to complete the paragraph:

All X are Y. Z is a X.  Therefore...

And likewise for other patterns that are taught in a logic class, e.g. If X 
then Y.  Y is false.  Therefore...

Finally you give examples in formal notation and their English equivalents, (X 
= Y) ^  ~Y, and again use statistical modeling to learn the substitution 
rules to do these conversions.

To get to this point I think you will first need to train the language model to 
detect higher level grammatical structures such as phrases and sentences, not 
just word categories.

I believe this can be done using a neural model.  This has been attempted using 
connectionist models, where neurons represent features at different levels of 
abstraction, such as letters, words, parts of speech, phrases, and sentence 
structures, in addition to time delayed copies of these.  A problem with 
connectionist models is that each word or concept is assigned to a single 
neuron, so there is no biologically plausible mechanism for learning new words. 
 A more accurate model is one in which each concept is correleated with many 
neurons to varying degrees, and each neuron is correlated with many concepts.  
Then we have a mechanism, which is to shift a large number of neurons slightly 
toward a new concept.  Except for this process, we can still use the 
connectionist model as an approximation to help us understand the true model, 
with the understanding that a single weight in the model actually represents a 
large number of connections.

I believe the language learning algorithm is essentially Hebb's model of 
classical conditioning, plus some stability constraints in the form of lateral 
inhibition and fatigue.  Right now this is still research.  I have no 
experimental results to show that this model would work.  It is far from 
developed.  I hope to test it eventually by putting it into a text compressor.  
If it does work, I don't know if it will train to a high enough level to solve 
logical inference, at least not without some hand written training data or a 
textbook on logic.  If it does reach this point, we would show that examples of 
correct inference compress smaller than incorrect examples.  To have it answer 
questions I would need to add a model of discourse, but that is a long way off. 
 Most training text is not interactive, and I would need about 1 GB.

Maybe you have some ideas?

-- Matt Mahoney, [EMAIL PROTECTED]



- Original Message 

From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com

Sent: Thursday, November 16, 2006 7:17:55 PM

Subject: Re: [agi] One grammar parser URL







 On 11/16/06, James Ratcliff [EMAIL PROTECTED] wrote: 

 Correct, 

   Using inferences only works in toy, or small well understood domains, as 
 inevitably when it goes 2+ steps away from direct knowledge it will be making 
 large assumptions and be wrong. 

 

 My thoughts have been on an AISim as well, but I am laying out the works for 
 it to be massivley available to many users.  How many people are actively 
 working with the AGISim, or do you expect to be, and do you feel that small 
 set of user interaction with it will produce enough experience to advance the 
 AI knowledge base? 

 

 My presumption is to make the final AISim a simple enough but intersting 
 interface to allow any number of potential users to interact, teach, and play 
 with the bots inside.  

 

 I have a very very basic, open structure for the knowledge base, and allow 
 users to tweak and change the actual action functions available and create 
 and remove items in the world to interact with. 



  

 I wish to create a massively popular AI platform as well =)  But my take would

Re: [agi] One grammar parser URL

2006-11-17 Thread Matt Mahoney

 see that the requirement for deterministic computation 
should be an obstacle to building a language model on a computer.

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: James Ratcliff [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Friday, November 17, 2006 9:40:41 AM
Subject: Re: [agi] One grammar parser URL

Not quite gonna work that way unfortunatly. (I think)

The 10^9 figure you used was the compressed amount of data for a lifetime of a 
human.  You cant just give a NN that much data, you have to give it 10^9*X 
amount of data.  The NN will need many exponential times the amount of training 
data to get the final results, to boil down to a final 10^9 amount of data.
   You may even be able to show that a computer can still process the new 
larger amount in a reasonable lifetime.  Now the hard part, how do you generate 
that much information?  Real experience correct information?  You cant 
unfortuntaly.
I have over 600 novels
 right now for mine.  But the amount of compressed acutall Knowledge in all 
those... maybe a couple novels worth, and it is very very hard to even compress 
and learn that small amount of information.

  This NN line of reasoning, I havent seen prove effective on the totoal task 
of AGI, though it is wonderfull for the smaller component modules like vision / 
motor control, and is necessary there.

  Humans learn with a very small amount of information, and I really think we 
must model the AGI after this is some fashion, with the caveat that it is 
possible to train the AGI with 1,000 or a million people instead of just a 
small team of one or two, by distributing the learning activities throughout 
the internet.  But that still gives us a
 very small sample size of the entireity of world experience.

James Ratcliff

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-18 Thread Matt Mahoney

When I refer to a quantity of information, I mean its algorithmic complexity, 
the size of the smallest program that generates it.  So yes, the Mandelbrot set 
contains very little information.  I realize that algorithmic complexity is not 
obtainable in general.  When I express AI or language modeling in terms of 
compression, I mean that the goal is to get as close to this unobtainable limit 
as possible.

Algorithmic complexity can apply to either finite or infinite series.  For 
example, the algorithmic complexity of a string of n zero bits is log n + C for 
some constant C that depends on your choice of universal Turing machine.  The 
complexity of an infinite string of zero bits is a (small) constant C.

When I talk about Kauffman's assertion that complex systems evolve toward the 
boundary between stability and chaos, I mean a discrete approximation of these 
concepts.  These are defined for dynamic systems in real vector spaces 
controlled by differential equations.  (Chaos requires at least 3 dimensions).  
A system is chaotic if its Lyapunov exponent is greater than 1, and stable if 
less than one.  Extensions to discrete systems have been described.  For 
example, the logistic map x := rx(1 - x), 0  x  1, goes from stable to 
chaotic as r grows from 0 to 4.  For discrete spaces, pseudo random number 
generators are simple examples of chaotic systems.  Kauffman studied chaos in 
large discrete systems (state machines with randomly connected logic gates) and 
found that the systems transition from stable to chaotic as the number of 
inputs per gate is increased from 2 to 3.  At the boundary, the number of 
discrete attractors (repeating cycles) is about the square root of the
 number of variables.  Kauffman noted that gene regulation can be modeled this 
way (gene combinations turn other genes on or off) and that the number of human 
cell types (254) is about the square root of the number of genes (he estimated 
100K, but actually 30K).  I noted (coincidentally?) that vocabulary size is 
about the square root of the size of a language model.

The significance of this to AI is that I believe it bounds the degree of 
interconnectedness of knowledge.  It cannot be so great that small updates to 
the AI result in large changes in behavior.  This places limits on what we can 
build.  For example, in a neural network with feedback loops, the weights would 
have to be kept small.

We should not confuse symbols with meaning.  A language model associates 
patterns of symbols with other patterns of symbols.  It is not grounded.  A 
model does not need vision to know that the sky is blue.  They are just words.  
I believe that an ungrounded model (plus a discourse model, which has a sense 
of time and who is speaking) can pass the Turing test.
 
I don't believe all of the conditions are in place for a hard takeoff yet.  You 
need:
1. Self replicating computers.
2. AI smart enough to write programs from natural language specifications.
3. Enough hardware on the Internet to support AGI.
4. Execute access.



1. Computer manufacturing depends heavily on computer automation but you still 
need humans to make it all work.

2. AI language models are now at the level of a toddler, able to recognize 
simple sentences of a few words, but they can already learn in hours or days 
what takes a human years.

3. I estimate an adult level language model will fit on a PC but it would take 
3 years to train it.  A massively parallel architecure like Google's MapReduce 
could do it in an hour, but it would require a high speed network.  A 
distributed implementation like GIMPS or SETI would not have enough 
interconnection speed to support a language model.  I think you need about a 
1Gb/s connection with low latency to distribute it over a few hundred PCs.

4. Execute access is one buffer overflow away.


-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Mike Dougherty [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, November 18, 2006 1:32:05 AM
Subject: Re: [agi] A question on the symbol-system hypothesis

I'm not sure I follow every twist in this thread.  No... I'm sure I don't 
follow every twist in this thread.

I have a question about this compression concept.  Compute the number of pixels 
required to graph the Mandelbrot set at whatever detail you feel to be a 
sufficient for the sake of example.  Now describe how this 'pattern' is 
compressed.  Of course the ideal compression is something like 6 bytes.  Show 
me a 6 byte jpg of a mandelbrot  :)


Is there a concept of compression of an infinite series?  Or was the term 
bounding being used to describe the attractor around which the values tends 
to fall?  chaotic attractor, statistical median, etc.  they seem to be 
describing the same tendency of human pattern recognition of different types of 
data.


Is a 'symbol' an idea, or a handle on an idea?  Does this support the mechanics 
of how concepts can be built from agreed-upon ideas

Re: [agi] A question on the symbol-system hypothesis

2006-11-18 Thread Matt Mahoney

I think your definition of understanding is in agreement with what Hutter 
calls intelligence, although he stated it more formally in AIXI.  An agent and 
an enviroment are modeled as a pair of interactive Turing machines that pass 
symbols back and forth.  In addition, the environment passes a reward signal to 
the agent, and the agent has the goal of maximizing the accumulated reward.  
The agent does not, in general, have a model of the environment, but must learn 
it.  Intelligence is presumed to be correlated with a greater accumulated 
reward (perhaps averaged over a Solomonoff distribution of all environments).
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: James Ratcliff [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Saturday, November 18, 2006 7:42:19 AM
Subject: Re: [agi] A question on the symbol-system hypothesis


Have to amend that to acts or replies
and it could react unpredictably depending on the humans level of 
understanding if it sees a nice neat answer, (like the jumping thru the 
window cause the door was blocked) that the human wasnt aware of, or was 
suprised about it would be equally good.

And this doesnt cover the opposite of what other actions can be done, and what 
are the consequences, that is also important.

And lastly this is for a situation only, we also have the more general case 
about understading a thing  Where when it sees. or has, or is told about a 
thing, it understands it if, it know about general properties, and actions that 
can be done with, or using the thing.

The main thing being we cant and arnt really defining understanding but the 
effect of  the understanding, either in action or in a language reply.

And it should be a level of understanding, not just a y/n.

So if one AI saw an apple and
 said, I can throw /  cut / eat it, and weighted those ideas. and the second 
had the same list, but weighted eat as more likely, and/or knew people 
sometimes cut it before eating it.  Then the AI would understand to a higher 
level.
Likewise if instead, one knew you could bake an apple pie, or apples came from 
apple trees, he would understand more.

So it starts looking like a knowledge test then.

Maybe we could extract simple facts from wiki, and start creating a test there, 
then add in more complicated things.

James

Charles D Hixson [EMAIL PROTECTED] wrote: Ben Goertzel wrote:
 ...
 On the other hand, the notions of intelligence and understanding
 and so forth being bandied about on this list obviously ARE intended
 to capture essential aspects of the
 commonsense notions that share the
 same word with them.
 ...
 Ben
Given that purpose, I propose the following definition:
A system understands a situation that it encounters if it predictably 
acts in such a way as to maximize the probability of achieving it's 
goals in that situation.

I'll grant that it's a bit fuzzy, but I believe that it captures the 
essence of the visible evidence of understanding.  This doesn't say what 
understanding is, merely how you can recognize it.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



___
James Ratcliff - http://falazar.com
New Torrent Site, Has TV and Movie Downloads! 
http://www.falazar.com/projects/Torrents/tvtorrents_show.php 

Sponsored Link

   
Mortgage rates as low as 4.625% - $150,000 loan for $579 a month. Intro-*Terms

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] new paper: What Do You Mean by AI?

2006-11-18 Thread Matt Mahoney

Pei, you classified NARS as a principle-based AI.  Are there any others in 
that category?  What about Novamente?

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Pei Wang [EMAIL PROTECTED]
To: agi@v2.listbox.com agi@v2.listbox.com
Sent: Friday, November 17, 2006 11:51:58 AM
Subject: [agi] new paper: What Do You Mean by AI?

Hi,

A new paper of mine is put on-line for comment. English corrections
are also welcome. You can either post to this mailing list or send me
private emails.

Thanks in advance.

Pei

---

TITLE: What Do You Mean by AI?

ABSTRACT: Many problems in AI study can be traced back to the
confusion of different research goals.  In this paper, five typical
ways to define AI are clarified, analyzed, and compared. It is argued
that though they are all legitimate research goals, they lead the
research to very different directions. Furthermore, most of them have
trouble to give AI a proper identity.

URL: http://nars.wang.googlepages.com/wang.AI_Definitions.pdf

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] Understanding Natural Language

2006-11-26 Thread Matt Mahoney

My point about artificial languages is I don't believe that they are of much 
use in helping to understand or solve the natural language modeling problem, 
which is a central problem to AGI.  Ben mentioned one use, which is to use 
Lojban++ in combination with English to train an AGI in English.  In this case, 
Lojban++ serves to help ground the language, just as using a 3-D modeling 
language could also be used to describe the environment.  In this case, any 
language which is expressive enough to do this and is familiar to the developer 
will do.

It is a different case where we require users to learn an artificial language 
because we don't know how to model natural language.  I don't see how this can 
lead to any significant insights.  There are already many examples of 
unabiguous and easy to parse programming languages (including superficially 
English-like languages such as COBOL and SQL) and formal knowledge 
representation languages (Cycl, prolog, etc).

An AGI has to deal with ambiguity and errors in language.  Consider the 
following sentence which I used earlier: I could even invent a new branch of 
mathematics, introduce appropriate notation, and express ideas in it.  What 
does it refer to?  The solution in an artificial language would be either to 
forbid pronouns (as in most programming languages) or explicitly label it to 
make the meaning explicit.  But people don't want or need to do this.  They can 
figure it out by context.  If your AGI can't use context to solve such problems 
then you haven't solved the natural language modeling problem, and a vast body 
of knowledge will be inaccessible.

I think you will find that writing a Lojban parser will be trivial compared to 
writing an English to Lojban translator.
 
Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote:
My initial reasoning was that right now many programs don't use AI,
because programmers don't know, and the ones that do can't easily add
code.

It is because language modeling is unsolved.  Computers would be much easier to 
use if we could talk to them in English.  But they do not understand.  We don't 
know how to make them understand.

But we are making progress.  Google will answer simple, natural language 
questions (although they don't advertise it).  The fact that others haven't 
done it suggests the problem requires vast computational resources and training 
data.



-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Andrii (lOkadin) Zvorygin [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, November 26, 2006 4:37:02 PM
Subject: Re: Re: [agi] Understanding Natural Language

On 11/25/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote:
  Even if we were able to constrain the grammar, you still have the
 problem that people will still make ungrammatical statements, misspell
 words, omit words, and so on.
 Amazing you should mention such valid points against natural languages.

 This misses the point.  Where are you going to get 1 GB of Lojban text to 
 train your language model?
Well A:  I could just get IRC logs and mailing lists of the current
Lojban community.
B: point is to translate English into Lojban
C: I'm not training a language model. I'm creating a parser, then a
translator, then other things. The translator will have some elements
of an AI probably Bayesian probability will be involved, it's too
early to say however. I may be on the wrong list discussing this.

If you require that all text pass through a syntax checker for
errors, you will greatly increase the cost of generating your training
data.
Well A: There are rarely any errors -- unlike in a natural language
like say English.
B: Addressed above.

This is not a trivial problem.
Which one? Maybe as a whole it's not trivial, but when you break it
down the little pieces are all individually trivial.

It is a big part of why programmers can only write 10 lines of code
per day on projects 1/1000 the size of a language model.
Monolithic programming is the paradigm of the past, is one of the
reasons I'm creating this new development model.
Then when you have built the model, you will still have a system that
is intolerant of errors and hard to use.
Because of the nature of the development model -- designed after
functional programming languages, going to be able to add functions
anywhere in the process without interupting the rest of the functions,
as it wont be changing the input other functions recieve(unless that
is the intent).
Hard to use? Well we'll see when I have a basic implementation, the
whole point is so that it will be easy to use, maybe it wont work out
though --  can't see how. .iacu'i(skepticism)
Your language model needs to have a better way to deal with
inconsistency than to report errors and make more work for the user.
It can easily just check what the previous response of this user, or
someone else that has made a similar error was when correcting.
Trivial once we get

Re: [agi] Understanding Natural Language

2006-11-28 Thread Matt Mahoney

Philip Goetz [EMAIL PROTECTED] wrote:
The use of predicates for representation, and the use of logic for
reasoning, are separate issues.  I think it's pretty clear that
English sentences translate neatly into predicate logic statements,
and that such a transformation is likely a useful first step for any
sentence-understanding process.  

I don't think it is clear at all.  Try translating some poetry.  Even for 
sentences that do have a clear representation in first order logic, the 
translation from English is not straightforward at all.  It is an unsolved 
problem.

I also dispute that it is even useful for sentence understanding.  Google 
understands simple questions, and its model is just a bag of words.  Attempts 
to apply parsing or reasoning to information retrieval have generally been a 
failure.

It would help to define what sentence-understanding means.  I say a computer 
understands English if it can correctly assign probabilities to long strings, 
where correct means ranked in the same order as judged by humans.  So a 
program that recognizes the error in the string the cat caught a moose could 
be said to understand English.  Thus, the grammar checker in Microsoft Word 
would have more understanding of a text document than a simple spell checker, 
but less understanding than most humans.  Maybe you have a different 
definition.  A reasonable definition for AI should be close to the conventional 
meaning and also be testable without making any assumption about the internals 
of the machine.

Now it seems to me that you need to understand sentences before you can 
translate them into FOL, not the other way around. Before you can translate to 
FOL you have to parse the sentence, and before you can parse it you have to 
understand it, e.g.

I ate pizza with pepperoni.
I ate pizza with a fork.

Using my definition of understanding, you have to recognize that ate with a 
fork and pizza with pepperoni rank higher than ate with pepperoni and 
pizza with a fork.  A parser needs to know millions of rules like this.
  
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Philip Goetz [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, November 28, 2006 2:47:41 PM
Subject: Re: [agi] Understanding Natural Language

On 11/24/06, J. Storrs Hall, PhD. [EMAIL PROTECTED] wrote:
 On Friday 24 November 2006 06:03, YKY (Yan King Yin) wrote:
  You talked mainly about how sentences require vast amounts of external
  knowledge to interpret, but it does not imply that those sentences cannot
  be represented in (predicate) logical form.

 Substitute bit string for predicate logic and you'll have a sentence that
 is just as true and not a lot less useful.

  I think there should be a
  working memory in which sentences under attention would bring up other
  sentences by association.  For example if a person is being kicked is in
  working memory, that fact would bring up other facts such as being kicked
  causes a person to feel pain and possibly to get angry, etc.  All this is
  orthogonal to *how* the facts are represented.

 Oh, I think the representation is quite important. In particular, logic lets
 you in for gazillions of inferences that are totally inapropos and no good
 way to say which is better. Logic also has the enormous disadvantage that you
 tend to have frozen the terms and levels of abstraction. Actual word meanings
 are a lot more plastic, and I'd bet internal representations are damn near
 fluid.

The use of predicates for representation, and the use of logic for
reasoning, are separate issues.  I think it's pretty clear that
English sentences translate neatly into predicate logic statements,
and that such a transformation is likely a useful first step for any
sentence-understanding process.  Whether those predicates are then
used to draw conclusions according to a standard logic system, or are
used as inputs to a completely different process, is a different
matter.

 The open questions are representation -- I'm leaning towards CSG in Hilbert
 spaces at the moment, but that may be too computationally demanding -- and
 how to form abstractions.

Does CSG = context-sensitive grammar in this case?  How would you use
Hilbert spaces?

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Understanding Natural Language

2006-11-28 Thread Matt Mahoney

First order logic (FOL) is good for expressing simple facts like all birds 
have wings or no bird has hair, but not for statements like most birds can 
fly.  To do that you have to at least extend it with fuzzy logic (probability 
and confidence).

A second problem is, how do you ground the terms?  If you have for all X, 
bird(X) = has(X, wings), where does bird, wings, has get their 
meanings?  The terms do not map 1-1 to English words, even though we may use 
the same notation.  For example, you can talk about the wings of a building, or 
the idiom wing it.  Most words in the dictionary list several definitions 
that depend on context.  Also, words gradually change their meaning over time.

I think FOL represents complex ideas poorly.  Try translating what you just 
wrote into FOL and you will see what I mean.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Philip Goetz [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, November 28, 2006 5:45:51 PM
Subject: Re: [agi] Understanding Natural Language

Oops, Matt actually is making a different objection than Josh.

 Now it seems to me that you need to understand sentences before you can 
 translate them into FOL, not the other way around. Before you can translate 
 to FOL you have to parse the sentence, and before you can parse it you have 
 to understand it, e.g.

 I ate pizza with pepperoni.
 I ate pizza with a fork.

 Using my definition of understanding, you have to recognize that ate with a 
 fork and pizza with pepperoni rank higher than ate with pepperoni and 
 pizza with a fork.  A parser needs to know millions of rules like this.

Yes, this is true.  When I said neatly, I didn't mean easily.  I
mean that the correct representation in predicate logic is very
similar to the English, and doesn't lose much meaning.  It was
misleading of me to say that it's a good starting point, though, since
you do have to do a lot to get those predicates.

A predicate representation can be very useful.  This doesn't mean that
you have to represent all of the predications that could be extracted
from a sentence.  The NLP system I'm working on does not, in fact, use
a parse tree, for essentially the reasons Matt just gave.  It doesn't
want to make commitments about grammatical structure, so instead it
just groups things into phrases, without deciding what the
dependencies are between those phrases, and then has a bunch of
different demons that scan those phrases looking for particular
predications.  As you find predications in the text, you can eliminate
certain choices of lexical or semantic category for words, and
eliminate arguments so that they can't be re-used in other
predications.  You never actually find the correct parse in our
system, but you could if you wanted to.  It's just that, we've already
extracted the meaning that we're interested in by the time we have
enough information to get the right parse, so the parse tree isn't of
much use.  We get the predicates that we're interested in, for the
purposes at hand.  We might never have to figure out whether pepperoni
is a part or an instrument, because we don't care.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-11-29 Thread Matt Mahoney

So what is your definition of understanding?

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Philip Goetz [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, November 29, 2006 5:36:39 PM
Subject: Re: [agi] A question on the symbol-system hypothesis

On 11/19/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 I don't think is is possible to extend the definition of understanding to 
 machines in a way that would be generally acceptable, in the sense that 
 humans understand understanding.  Humans understand language.  We don't 
 generally say that animals in the wild understand their environment, although 
 we do say that animals can be trained to understand commands.

I generally say that animals in the wild understand their environment.
 If you don't, you are using a definition of understand that I don't
understand.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-12-01 Thread Matt Mahoney


--- Philip Goetz [EMAIL PROTECTED] wrote:

 On 11/30/06, James Ratcliff [EMAIL PROTECTED] wrote:
  One good one:
  Consciousness is a quality of the mind generally regarded to comprise
  qualities such as subjectivity, self-awareness, sentience, sapience, and
 the
  ability to perceive the relationship between oneself and one's
 environment.
  (Block 2004).
 
  Compressed: Consciousness = intelligence + autonomy
 
 I don't think that definition says anything about intelligence or
 autonomy.  All it is is a lot of words that are synonyms for
 consciousness, none of which really mean anything.

I think if you insist on an operational definition of consciousness you will
be confronted with a disturbing lack of evidence that it even exists.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]

2006-12-01 Thread Matt Mahoney


--- Hank Conn [EMAIL PROTECTED] wrote:

 On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote:
  The goals of humanity, like all other species, was determined by
  evolution.
  It is to propagate the species.
 
 
 That's not the goal of humanity. That's the goal of the evolution of
 humanity, which has been defunct for a while.

We have slowed evolution through medical advances, birth control and genetic
engineering, but I don't think we have stopped it completely yet.

 You are confusing this abstract idea of an optimization target with the
 actual motivation system. You can change your motivation system all you
 want, but you woulnd't (intentionally) change the fundamental specification
 of the optimization target which is maintained by the motivation system as a
 whole.

I guess we are arguing terminology.  I mean that the part of the brain which
generates the reward/punishment signal for operant conditioning is not
trainable.  It is programmed only through evolution.

   To some extent you can do this.  When rats can
  electrically stimulate their nucleus accumbens by pressing a lever, they
  do so
  nonstop in preference to food and water until they die.
 
  I suppose the alternative is to not scan brains, but then you still have
  death, disease and suffering.  I'm sorry it is not a happy picture either
  way.
 
 
 Or you have no death, disease, or suffering, but not wireheading.

How do you propose to reduce the human mortality rate from 100%?


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] Language acquisition in humans: How bound up is it with tonal pattern recognition...?

2006-12-02 Thread Matt Mahoney


--- Ben Goertzel [EMAIL PROTECTED] wrote:

  I think that our propensity for music is pretty damn simple: it's a
  side-effect of the general skill-learning machinery that makes us memetic
  substrates. Tunes are trajectories in n-space as are the series of motor
  signals involved in walking, throwing, hitting, cracking nuts, chipping
  stones, etc, etc. Once we evolved a general learn-to-imitate-by-observing
  ability it will get used for imitating just about anything.
 
 Well, Steve Mithen argues otherwise in his book, based on admittedly
 speculative interpretations of anthropological/archaeological
 evidence...
 
 He argues for the presence of a specialized tonal pattern recognition
 module in the human brain, and the specific consequences for language
 learning of the existence of such a module...
 
 -- Ben

I believe that Desmond Morris (The Naked Ape) argued that we like music
because babies that liked to listen to their mother's heartbeat had a survival
advantage.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]

2006-12-02 Thread Matt Mahoney


--- Hank Conn [EMAIL PROTECTED] wrote:

 On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote:
 
 
  --- Hank Conn [EMAIL PROTECTED] wrote:
 
   On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote:
I suppose the alternative is to not scan brains, but then you still
  have
death, disease and suffering.  I'm sorry it is not a happy picture
  either
way.
  
  
   Or you have no death, disease, or suffering, but not wireheading.
 
  How do you propose to reduce the human mortality rate from 100%?
 
 
 Why do you ask?

You seemed to imply you knew an alternative to brain scanning, or did I
misunderstand?



-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-12-02 Thread Matt Mahoney

 a connection was an attack,
  because the only information to tell that a connection was an attack
  was in the TCP packet contents, while my system looked only at packet
  headers.  And yet, the system succeeded in placing about 50% of all
  attacks in the top 1% of suspicious connections.  To this day, I don't
  know how it did it.
 
  -
  This list is sponsored by AGIRI: http://www.agiri.org/email
  To unsubscribe or change your options, please go to:
  http://v2.listbox.com/member/?list_id=303
  
 
 -
 This list is sponsored by AGIRI: http://www.agiri.org/email
 To unsubscribe or change your options, please go to:
 http://v2.listbox.com/member/?list_id=303
 


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Re: Motivational Systems of an AI

2006-12-03 Thread Matt Mahoney


--- Richard Loosemore [EMAIL PROTECTED] wrote:

 
 
 Matt Mahoney wrote:
  --- Richard Loosemore [EMAIL PROTECTED] wrote:
  I am disputing the very idea that monkeys (or rats or pigeons or humans) 
  have a part of the brain which generates the reward/punishment signal 
  for operant conditioning.
 
  This is behaviorism.  I find myself completely at a loss to know where 
  to start, if I have to explain what is wrong with behaviorism.
  
  Call it what you want.  I am arguing that there are parts of the brain
 (e.g.
  the nucleus accumbens) responsible for reinforcement learning, and
  furthermore, that the synapses along the input paths to these regions are
 not
  trainable.  I argue this has to be the case because an intelligent system
  cannot be allowed to modify its motivational system.  Our most fundamental
  models of intelligent agents require this (e.g. AIXI -- the reward signal
 is
  computed by the environment).  You cannot turn off hunger or pain.  You
 cannot
  control your emotions.  Since the synaptic weights cannot be altered by
  training (classical or operant conditioning), they must be hardwired as
  determined by your DNA.
 
 
 Pei has already spoken eloquently on many of these questions.

Yes, and I agree with most of his comments.  I need to clarify that the part
of the motivational system that is not trainable is the one that computes top
level goals such as hunger, thirst, pain, the need for sleep, reproductive
drive, etc.  I think we can agree on this.  Regardless of training, everyone
will get hungry if they don't eat.  You can temporarily distract yourself from
hunger, but a healthy person can't change this top level goal.  If this were
not true, obesity would not be such a problem, and instead you would see a lot
of people starving themselves to death.

I think the confusion is over learned secondary goals, such as seeking money
to buy food, or education to get a better job.  So in that context, I agree
with most of your comments too.

  That all human learning can be reduced to classical and operant
 conditioning?
 
 Of course I am disputing this.  This is the behaviorist idea that has 
 been completely rejected by the cognitive science community since 1956.
 
 If you are willing to bend the meaning of the terms classical and 
 operant conditioning sufficiently far from their origins, you might be 
 able to make the idea more plausible, but that kind of redefinition is a 
 little silly, and I don't see you trying to do that.

How about if I call them supervised and unsupervised learning?

Of course this is not helpful.  What I am trying to do is understand how
learning works in humans so it can be modeled in AGI.  Classical conditioning
(e.g. Pavlov) has a simple model proposed by Hebb in 1949.  If neuron A fires
followed by B after time t, then the weight from A to B is increased in
proportion to AB/t (where A and B are activation levels).  The dependence on A
and B has been used in neural models long before synaptic weight changes were
observed in animal brains.  The factor 1/t (for t greater than a few hundred
milliseconds) is supported by animal experiments.

The model for reinforcement learning is not so clear.  We can imagine several
possibilities.

1. The weights of a neural network are randomly and temporarily varied.  After
a positive reinforcement, the changes become permanent.  If negative, the
changes are undone or made in the opposite direction.

2. The neuron activation level of B is varied by adding random noise, dB. 
After reinforcment r after time t, the weight change from A to B is
proportional to A(dB)r/t.

3. There is no noise.  Let dB be the rate of increase of B.  The weight change
is proportional to A(dB)r/t.

4. (as pointed out by Philip Goetz)
http://www.iro.umontreal.ca/~lisa/pointeurs/RivestNIPS2004.pdf
The weight change is proportional to AB(r-p), where p is the predicted
reinforcement (trained by classical conditioning) and r is the actual
reinforcement (tri-Hebbian model).

And many other possibilities.  We don't know what the brain uses.  It might be
a combination of these.  From animal experiments we know that the learning
rate is proportional to r/t, but not much else.  From computer simulations, we
know there is no best solution because it depends on the problem.  So I would
like to see an answer to this question.  How does it work in the brain?  How
should it be done in AGI?


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]

2006-12-03 Thread Matt Mahoney


--- Mark Waser [EMAIL PROTECTED] wrote:

  You cannot turn off hunger or pain.  You cannot
  control your emotions.
 
 Huh?  Matt, can you really not ignore hunger or pain?  Are you really 100% 
 at the mercy of your emotions?

Why must you argue with everything I say?  Is this not a sensible statement?

  Since the synaptic weights cannot be altered by
  training (classical or operant conditioning)
 
 Who says that synaptic weights cannot be altered?  And there's endless 
 irrefutable evidence that the sum of synaptic weights is certainly 
 constantly altering by the directed die-off of neurons.

But not by training.  You don't decide to be hungry or not, because animals
that could do so were removed from the gene pool.

Is this not a sensible way to program the top level goals for an AGI?


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] A question on the symbol-system hypothesis

2006-12-03 Thread Matt Mahoney

Mark,

Philip Goetz gave an example of an intrusion detection system that learned 
information that was not comprehensible to humans.  You argued that he could
have understood it if he tried harder.  I disagreed and argued that an
explanation would be useless even if it could be understood.

If you use a computer to add up a billion numbers, do you check the math, or
do you trust it to give you the right answer?

My point is that when AGI is built, you will have to trust its answers based
on the correctness of the learning algorithms, and not by examining the
internal data or tracing the reasoning.  I believe this is the fundamental
flaw of all AI systems based on structured knowledge representations, such as
first order logic, frames, connectionist systems, term logic, rule based
systems, and so on.  The evidence supporting my assertion is:

1. The relative success of statistical models vs. structured knowledge.
2. Arguments based on algorithmic complexity.  (The brain cannot model a more
complex machine).
3. The two examples above.

I'm afraid that's all the arguments I have.  Until we build AGI, we really
won't know.  I realize I am repeating (summarizing) what I have said before. 
If you want to tear down my argument line by line, please do it privately
because I don't think the rest of the list will be interested.

--- Mark Waser [EMAIL PROTECTED] wrote:

 Matt,
 
 Why don't you try addressing my points instead of simply repeating 
 things that I acknowledged and answered and then trotting out tired old red 
 herrings.
 
 As I said, your network intrusion anomaly detector is a pattern matcher.
 
 It is a stupid pattern matcher that can't explain it's reasoning and can't 
 build upon what it has learned.
 
 You, on the other hand, gave a very good explanation of how it works. 
 Thus, you have successfully proved that you are an explaining intelligence 
 and it is not.
 
 If anything, you've further proved my point that an AGI is going to have
 
 to be able to explain/be explained.
 
 
 - Original Message - 
 From: Matt Mahoney [EMAIL PROTECTED]
 To: agi@v2.listbox.com
 Sent: Saturday, December 02, 2006 5:17 PM
 Subject: Re: [agi] A question on the symbol-system hypothesis
 
 
 
  --- Mark Waser [EMAIL PROTECTED] wrote:
 
  A nice story but it proves absolutely nothing . . . . .
 
  I know a little about network intrusion anomaly detection (it was my
  dissertation topic), and yes it is an important lessson.
 
  Network traffic containing attacks has a higher algorithmic complexity 
  than
  traffic without attacks.  It is less compressible.  The reason has nothing
 
  to
  do with the attacks, but with arbitrary variations in protocol usage made 
  by
  the attacker.  For example, the Code Red worm fragments the TCP stream 
  after
  the HTTP GET command, making it detectable even before the buffer 
  overflow
  code is sent in the next packet.  A statistical model will learn that this
 
  is
  unusual (even though legal) in normal HTTP traffic, but offer no 
  explanation
  why such an event should be hostile.  The reason such anomalies occur is
  because when attackers craft exploits, they follow enough of the protocol 
  to
  make it work but often don't care about the undocumented conventions 
  followed
  by normal servers and clients.  For example, they may use lower case 
  commands
  where most software uses upper case, or they may put unusual but legal 
  values
  in the TCP or IP-ID fields or a hundred other things that make the attack
  stand out.  Even if they are careful, many exploits require unusual 
  commands
  or combinations of options that rarely appear in normal traffic and are
  therefore less carefully tested.
 
  So my point is that it is pointless to try to make an anomaly detection 
  system
  explain its reasoning, because the only explanation is that the traffic is
  unusual.  The best you can do is have it estimate the probability of a 
  false
  alarm based on the information content.
 
  So the lesson is that AGI is not the only intelligent system where you 
  should
  not waste your time trying to understand what it has learned.  Even if you
  understood it, it would not tell you anything.  Would you understand why a
  person made some decision if you knew the complete state of every neuron 
  and
  synapse in his brain?
 
 
  You developed a pattern-matcher.  The pattern matcher worked (and I would
  dispute that it worked better than it had a right to).  Clearly, you do
  not understand how it worked.  So what does that prove?
 
  Your contention (or, at least, the only one that continues the previous
  thread) seems to be that you are too stupid to ever understand the 
  pattern
  that it found.
 
  Let me offer you several alternatives:
  1)  You missed something obvious
  2)  You would have understood it if the system could have explained it to
  you
  3)  You would have understood it if the system had managed to losslessly
  convert

Re: Motivational Systems of an AI [WAS Re: [agi] RSI - What is it and how fast?]

2006-12-05 Thread Matt Mahoney


--- Eric Baum [EMAIL PROTECTED] wrote:

 
 Matt --- Hank Conn [EMAIL PROTECTED] wrote:
 
  On 12/1/06, Matt Mahoney [EMAIL PROTECTED] wrote:  The goals
  of humanity, like all other species, was determined by 
  evolution.   It is to propagate the species.
  
  
  That's not the goal of humanity. That's the goal of the evolution
  of humanity, which has been defunct for a while.
 
 Matt We have slowed evolution through medical advances, birth control
 Matt and genetic engineering, but I don't think we have stopped it
 Matt completely yet.
 
 I don't know what reason there is to think we have slowed
 evolution, rather than speeded it up.
 
 I would hazard to guess, for example, that since the discovery of 
 birth control, we have been selecting very rapidly for people who 
 choose to have more babies. In fact, I suspect this is one reason
 why the US (which became rich before most of the rest of the world)
 has a higher birth rate than Europe.

Yes, but actually most of the population increase in the U.S. is from
immigration.  Population is growing the fastest in the poorest countries,
especially Africa.

 Likewise, I expect medical advances in childbirth etc are selecting
 very rapidly for multiple births (which once upon a time often killed 
 off mother and child.) I expect this, rather than or in addition to
 the effects of fertility drugs, is the reason for the rise in 
 multiple births.

The main effect of medical advances is to keep children alive who would
otherwise have died from genetic weaknesses, allowing these weaknesses to be
propagated.

Genetic engineering has not yet had much effect on human evolution, as it has
in agriculture.  We have the technology to greatly speed up human evolution,
but it is suppressed for ethical reasons.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The Singularity

2006-12-05 Thread Matt Mahoney


--- John Scanlon [EMAIL PROTECTED] wrote:

 Alright, I have to say this.
 
 I don't believe that the singularity is near, or that it will even occur.  I
 am working very hard at developing real artificial general intelligence, but
 from what I know, it will not come quickly.  It will be slow and
 incremental.  The idea that very soon we can create a system that can
 understand its own code and start programming itself is ludicrous.
 
 Any arguments?

Not very soon, maybe 10 or 20 years.  General programming skills will first
require an adult level language model and intelligence, something that could
pass the Turing test.

Currently we can write program-writing programs only in very restricted
environments with simple, well defined goals (e.g. genetic algorithms).  This
is not sufficient for recursive self improvement.  The AGI will first need to
be at the intellectual level of the humans who built it.  This means
sufficient skills to do research, and to write programs from ambiguous natural
language specificiations and have enough world knowledge to figure out what
the customer really wanted.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] A question on the symbol-system hypothesis

2006-12-05 Thread Matt Mahoney


--- Ben Goertzel [EMAIL PROTECTED] wrote:

 Matt Maohoney wrote:
  My point is that when AGI is built, you will have to trust its answers
 based
  on the correctness of the learning algorithms, and not by examining the
  internal data or tracing the reasoning.
 
 Agreed...
 
 I believe this is the fundamental
  flaw of all AI systems based on structured knowledge representations, such
 as
  first order logic, frames, connectionist systems, term logic, rule based
  systems, and so on.
 
 I have a few points in response to this:
 
 1) Just because a system is based on logic (in whatever sense you
 want to interpret that phrase) doesn't mean its reasoning can in
 practice be traced by humans.  As I noted in recent posts,
 probabilistic logic systems will regularly draw conclusions based on
 synthesizing (say) tens of thousands or more weak conclusions into one
 moderately strong one.  Tracing this kind of inference trail in detail
 is pretty tough for any human, pragmatically speaking...
 
 2) IMO the dichotomy between logic based and statistical AI
 systems is fairly bogus.  The dichotomy serves to separate extremes on
 either side, but my point is that when a statistical AI system becomes
 really serious it becomes effectively logic-based, and when a
 logic-based AI system becomes really serious it becomes effectively
 statistical ;-)

I see your point that there is no sharp boundary between structured knowledge
and statistical approaches.  What I mean is that the normal software
engineering practice of breaking down a hard problem into components with well
defined interfaces does not work for AGI.  We usually try things like:

input text -- parser -- semantic extraction -- inference engine -- output
text.

The fallacy is believing that the intermediate representation would be more
comprehensible than the input or output.  That isn't possible because of the
huge amount of data.  In a toy system you might have 100 facts that you can
compress down to a diagram that fits on a sheet of paper.  In reality you
might have a gigabyte of text that you can compress down to 10^9 bits. 
Whatever form this takes can't be more comprehensible than the input or output
text.

I think it is actually liberating to remove the requirement for transparency
that was typical of GOFAI.  For example, your knowledge representation could
still be any of the existing forms but it could also be a huge matrix with
billions of elements.  But it will require a different approach to build, not
so much engineering, but more of an experimental science, where you test
different learning algoriths at the inputs and outputs only.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Brain memory Map Article -

2006-12-20 Thread Matt Mahoney

Prior to reading this article it was my belief that the purpose of dreaming 
(REM sleep) was to copy medium term (daily) memories from the hippocampus to 
long term memory in the cerebral cortex.  REM sleep occurs in only 2 of the 3 
orders of mammals: placentals (which include humans and rodents) and 
marsupials.  Egg laying mammals such as the spiny anteater do not dream and 
have a much different brain structure.

I find it a mystery why memories are played back at high speed in reverse 
order, with excitations in the cortex preceding those in the hippocampus, and 
that this occurs during non REM sleep.  Perhaps this is part of a feedback loop 
to erase memories from the hippocampus after they have been copied.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Bob Mottram [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, December 19, 2006 8:45:34 AM
Subject: Re: [agi] Brain memory Map Article -


I was also reading that article.  The place cell phenomena has been known for 
many years.  For a long time I've thought that sleep might be used for 
something other than just down time and cellular repair, and this research does 
seem to confirm that sleep has some functional role.  It's interesting that 
memories are played back in reverse, which might suggest some form of 
back-propogation in which the brain is searching for the most likely causes of 
interesting events.







On 18/12/06, James Ratcliff [EMAIL PROTECTED] wrote:
Interesting article on how they really exhaustively mapped a rats brain...

The researchers could interpret the memories through electrodes inserted into 
the rats' brains, including into special neurons in the hippocampus. These 
neurons are known as place cells because each is activated when the rat 
passes a specific location, as if they were part of a map in the brain. The 
activation is so reliable that one can tell where a rat is in its cage by 
seeing which of its place cells is firing.


http://www.nytimes.com/2006/12/18/science/18memory.html?ref=us


James Ratcliff


___
James Ratcliff - http://falazar.com

New Torrent Site, Has TV and Movie Downloads! 
http://www.falazar.com/projects/Torrents/tvtorrents_show.php
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

http://mail.yahoo.com 
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:

http://v2.listbox.com/member/?list_id=303






This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] teleoperated robots

2007-01-07 Thread Matt Mahoney

One problem with teleoperation is latency delays.  If the robot and operator
are on opposite sides of the earth, then there is a round trip speed of light
latency of 133 ms, enough to impair some operations like driving a car.  As a
practical matter, latency will be longer because of routing delays, slower
than light transmission (fiber optic speed is 3/4 of c), and satellite links.

One solution is a system that anticipates teleoperator commands using local
sensory data, for example, a car that anticipates a steering or braking
command when seeing an obstacle in the road.  Notice what this gives us. 
First, teleoperation is a source of high quality training data.  Second, we
have a straightforward means of evaluating nonverbal AI control systems, just
as we have text compression (perplexity) to evaluate language models.  This
gives us a continuous pathway to AGI, not just a pass/fail test like the
Turing test.  Third, as AI systems improve, we have an unobtrusive means of
evaluating human teleoperators.  Those whose commands are most predictable are
rated highest.

--- Bob Mottram [EMAIL PROTECTED] wrote:

 There should also be a rating facility, where the person receiving the
 telerobot service can provide feedback on how well the job had been done.
 High scoring teleoperators would be more likely to get work than ones who
 just picked your tools up and threw them around.
 
 Within a few years I think there will be much money to be made - not out of
 the robots themselves which will be fairly dumb devices - but in the
 teleoperation services which act as relays between the teleoperator and the
 service consumer.  The teleop service provides a convenient mechanism for
 people to get paid for carrying out jobs remotely via the robot.
 
 One side effect of this is a truly global labour market.
 
 - Bob
 
 
 
 
 On 07/01/07, Neil H. [EMAIL PROTECTED] wrote:
 
  On 1/5/07, Olie Lamb [EMAIL PROTECTED] wrote:
  
   Well, I for one want a job assistant who can fetch things - what
  apprentices
   or surgical nurse-assistanty things are often called to do.
  
Assistant: Please get me a Phillips head screwdriver and half-a-dozen
  10mm
   screws
  
A robot that could
  
   1) Voice recognise instructions
   2) Understand simple commands like Get me X, Hold this still,
  Return
   this...
3) Manoeuvre from your work space to your tool-store
   4) Grab items from an appropriately set-up tool-store
etc
  
   Would be pretty damn useful, and I see most of this as being feasible
  with
   current day tech.  Sure, such an assistant would be pretty damn
  expensive,
   and less useful than a high-school-dropout apprentice/assistant (who can
   also run down the street and get you a sandwich), but this is a real,
   possible application for a robot.
 
  Actually, this makes me think that in the near-term (until automation
  catches up) there's a market for teleoperated robots. You could issue
  a request to the robot, which would get routed to a teleoperation
  company. Using an infrastructure somewhat like a call center (but
  hopefully with shorter delays) somebody would then be designated to
  handle your teleoperation until the task was complete. If
  teleoperation latency was important you could pay a premium to have
  the request routed to someplace in-country or in-state, otherwise you
  could have it routed to India or someplace else with lower labor
  costs.
 
  As tech progresses you could add in more automation, for things like
  grabbing specific objects, pathfinding, or handling other simple
  requests. Eventually it could get to the point where a human is only
  required for highly advanced procedures.
 
  Of course, there's potential privacy issues, but I'm sure somebody
  could figure out a solution for that.
 
  Thoughts?
 
  -- Neil
 
  -
  This list is sponsored by AGIRI: http://www.agiri.org/email
  To unsubscribe or change your options, please go to:
  http://v2.listbox.com/member/?list_id=303
 
 
 -
 This list is sponsored by AGIRI: http://www.agiri.org/email
 To unsubscribe or change your options, please go to:
 http://v2.listbox.com/member/?list_id=303
 


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] SOTA

2007-01-12 Thread Matt Mahoney


--- Bob Mottram [EMAIL PROTECTED] wrote:

 Ah, but is a thermostat conscious ?
 
 :-)

Are humans conscious?  It depends on your definition of consciousness, which
is really hard to define. 

Does a thermostat want to keep the room at a constant temperature?  Or does it
just behave as if that is what it wants?


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Project proposal: MindPixel 2

2007-01-14 Thread Matt Mahoney

--- Gabriel R [EMAIL PROTECTED] wrote:
 Also, if you can think of any way to turn the knowledge-entry process into a
 fun game or competition, go for it.  I've been told by a few people working
 on similar projects that making the knowledge-providing process engaging and
 fun for visitors ended up being a lot more important (and difficult) than
 they'd expected.

Cyc has a game like this called FACTory at http://www.cyc.com/
It's purpose is to help refine its knowledge base.  It presents statements and
asks you to rate them as true, false, don't know or doesn't make sense.  For
example.

- Most shirts are heavier than most appendixes.
- Pages are typically located in HVAC Chem Bio facilities.
- Terminals are typically located in studies.
- People perform or are involved in paying a mortgage more frequenty than they
perform or are involved in overbearing.
- Most BTU dozer blades are wider than most T-64 medium tanks.

The game exposes Cyc's shortcomings pretty quickly.  Cyc seems to lack a world
model and a language model.  Sentences seem to be constructed by relating
common properties of unrelated objects.  The set of common properties is
fairly small: size, weight, cost, frequency (for events), containment, etc. 
There does not seem to be any sense that Cyc understands the purpose or
function of objects.  The result is that context is no help in disambiguating
terms that have more than one meaning, such as appendix, page, or
terminal.

A language model would allow a more natural grammar, such as People pay
mortgages more often than they are overbearing.  This example also exposes
the fallacy of logical inference.  Inference allows you to draw conclusions
such as this, but why would you?  Inference is not a good model of human
thought.  A good model would compare related objects.  It might ask instead
whether people make mortgage payments more frequently than they receive
paychecks.  The game gives no hint that Cyc understands such relations.

Cyc has millions of hand coded assertions.  It has taken over 20 years to get
this far, and it seems we are not even close.  This seems to be a problem with
every knowledge representation based on labeled graphs (frame-slot, first
order logic, connectionist, expert system, etc).  Using English words to label
the elements of your data structure does not substitute for a language model. 
Also, this labeling tempts you to examine and update the knowledge manually. 
We should know by now that there is just too much data to do this.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Project proposal: MindPixel 2

2007-01-15 Thread Matt Mahoney


--- Stephen Reed [EMAIL PROTECTED] wrote:

 I worked at Cycorp when the FACTory game was developed.  The examples below
 do not reveal Cyc's knowledge of the assertions connecting these disparate
 concepts, rather most show that the argument constraints of the terms
 compared are rather overly generalized. The exception is the example Most
 BTU dozer blades are wider than most T-64 medium tanks. in which both
 concepts are specializations of Platform-Military.  Download and examine
 concepts in OpenCyc and Cyc's world model (or lack thereof by your
 standards) will be readily apparent.  You need ResearchCyc which has no
 license fee for research purposes, in order to evaluate its language model.
 -Steve

Thanks.  I did take another look at Cyc, at least this talk by Lenat at
Google.
http://video.google.com/videoplay?docid=-7704388615049492068

In spite of Cyc's lack of success at AGI (so far), it is still the biggest
repository of common sense knowledge.  He explains how Cyc had tried machine
learning approaches to acquiring such knowledge and why they failed.  They
knew early on that it would require a 1000 person-year effort to develop the
knowledge base and proceeded anyway.  Cyc has 3.2 million assertions, 300,000
concepts and 16,000 relations (is-a, contains, etc).  They tried very hard to
simplfy the knowledge base, to keep these numbers small.  Cyc is planning a
Web interface to its knowledge base.  If they make something useful, a 1000
person-year effort is nothing.

Lenat briefly mentions Sergey's (one of Google's founders) goal of solving AI
by 2020.  I think if Google and Cyc work together on this, they will succeed.

 
 - Original Message 
 From: Matt Mahoney [EMAIL PROTECTED]
 To: agi@v2.listbox.com
 Sent: Sunday, January 14, 2007 3:14:07 PM
 Subject: Re: [agi] Project proposal: MindPixel 2
 
 --- Gabriel R [EMAIL PROTECTED] wrote:
  Also, if you can think of any way to turn the knowledge-entry process into
 a
  fun game or competition, go for it.  I've been told by a few people
 working
  on similar projects that making the knowledge-providing process engaging
 and
  fun for visitors ended up being a lot more important (and difficult) than
  they'd expected.
 
 Cyc has a game like this called FACTory at http://www.cyc.com/
 It's purpose is to help refine its knowledge base.  It presents statements
 and
 asks you to rate them as true, false, don't know or doesn't make sense.  For
 example.
 
 - Most shirts are heavier than most appendixes.
 - Pages are typically located in HVAC Chem Bio facilities.
 - Terminals are typically located in studies.
 - People perform or are involved in paying a mortgage more frequenty than
 they
 perform or are involved in overbearing.
 - Most BTU dozer blades are wider than most T-64 medium tanks.
 
 The game exposes Cyc's shortcomings pretty quickly.  Cyc seems to lack a
 world
 model and a language model.  Sentences seem to be constructed by relating
 common properties of unrelated objects.  The set of common properties is
 fairly small: size, weight, cost, frequency (for events), containment, etc. 
 There does not seem to be any sense that Cyc understands the purpose or
 function of objects.  The result is that context is no help in
 disambiguating
 terms that have more than one meaning, such as appendix, page, or
 terminal.
 
 A language model would allow a more natural grammar, such as People pay
 mortgages more often than they are overbearing.  This example also exposes
 the fallacy of logical inference.  Inference allows you to draw conclusions
 such as this, but why would you?  Inference is not a good model of human
 thought.  A good model would compare related objects.  It might ask instead
 whether people make mortgage payments more frequently than they receive
 paychecks.  The game gives no hint that Cyc understands such relations.
 
 Cyc has millions of hand coded assertions.  It has taken over 20 years to
 get
 this far, and it seems we are not even close.  This seems to be a problem
 with
 every knowledge representation based on labeled graphs (frame-slot, first
 order logic, connectionist, expert system, etc).  Using English words to
 label
 the elements of your data structure does not substitute for a language
 model. 
 Also, this labeling tempts you to examine and update the knowledge manually.
 
 We should know by now that there is just too much data to do this.
 
 
 -- Matt Mahoney, [EMAIL PROTECTED]
 
 -
 This list is sponsored by AGIRI: http://www.agiri.org/email
 To unsubscribe or change your options, please go to:
 http://v2.listbox.com/member/?list_id=303
 
 
 
 
 
  


 Never miss an email again!
 Yahoo! Toolbar alerts you the instant new Mail arrives.
 http://tools.search.yahoo.com/toolbar/features/mail/
 
 -
 This list is sponsored by AGIRI: http://www.agiri.org/email
 To unsubscribe or change your options, please go to:
 http://v2

Re: [agi] Project proposal: MindPixel 2

2007-01-18 Thread Matt Mahoney

--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote:
 I'm not an academic (left uni a couple years ago) so I can't get academic
 funding for this.  If I can't start an AI business I'd have to entirely give
 up AI as a career.  I hope you can understand these circumstances.

Aren't there companies looking for AI researchers?  Google?

Maybe another approach (the one I took) is to publish something innovative,
and people come to you.  It won't make you rich, but I have so far gotten 3
small consulting jobs designing and writing data compression software or doing
research, all from home, simply because people have seen my work on my website
(PAQ compressor, large text benchmark, Hutter prize) or they just saw my posts
on comp.compression.  I never looked for any of this work.  I make enough
teaching at a nearby university as an adjunct, with lots of time off.  I'm
sure I could make more money if I wanted to work long hours in an office, but
I don't need to.

PAQ introduced a new compression algorithm (context mixing) when PPM
algorithms were the best known.  PAQ would not have made it to the top of the
benchmarks without the ideas and coding and testing efforts of others working
on it with no reward except name recognition.  That would not have happened if
it wasn't free (GPL open source).  Even now, I'm sure nobody would pay even
$20/copy when there is so much free competition.  Other good compressors
(Compressia, WinRK) have failed with this business model.

I think if you want to make a business out of AI, you are in for a lot of
work.First you need something that is truly innovative, that does
something that nobody else can do.  What will that be?  A search engine better
than Google?  A new operating system that understands natural language?  A car
that drives itself?  A household servant robot?  A program that can manage a
company?  A better spam detector?  Text compression?

Write down a well defined goal.  Do research.  What is your competition?  How
are your ideas better than what's been done?  Prove it (with benchmarks), and
the opportunities will come.



-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Chaitin randomness

2007-01-20 Thread Matt Mahoney


--- gts [EMAIL PROTECTED] wrote:

 We can imagine ourselves parsing the sequence, dividing it into two  
 groups: 1) complex/disorderly subsequences not amenable to simple  
 algorithmic derivation and 2) simple/orderly subsequences such as those  
 above that are so amenable.
 
 Now, if I understand Chaitin's information-theoretic compressibility  
 definition of randomness correctly (and I very likely do not), the  
 simple/orderly subsequences in group 2) are compressible and so would  
 count against the larger sequence in any compressibility measure of its  
 randomness. If that is so then a maximally random sequence might be best  
 considered as one that is at least slightly compressible. But this  
 definition would be contrary to Chaitin's idea that maximally random  
 sequences are incompressible!

Not so.  Any information you save by compressing the compressible bits of a
random sequence is lost because you also have to specify the location of those
bits.  (You can use the counting argument to prove this).

Also I don't believe there are two types of randomness (algorithmic and
process-based).  Process based randomness (flipping a coin, quantum mechanics,
etc) exists only in the context of an observer, something with memory such as
a sensor, computer, or brain.  If an observer has insufficient knowledge (or
memory) to model its environment exactly, then it must use a probabilistic
model.

Algorithmic theory places hard limits on what is computable.  An observer
cannot model an environment more (algorithmically) complex than itself.  If an
observer is part of the universe that it observes, then it must have fewer
states than the universe that includes it.  Therefore the universe must appear
probabilistic to any observer within it, even if the universe is
deterministic.

I think Einstein's view of quantum mechanics (God does not play dice) makes
more sense when viewed in this light.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] (video)The Future of Cognitive Computing

2007-01-23 Thread Matt Mahoney


--- Eugen Leitl [EMAIL PROTECTED] wrote:

 On Mon, Jan 22, 2007 at 05:26:43PM -0800, Matt Mahoney wrote:
 
  The issues of consciousness have been discussed on the singularity list. 
 These are hard questions.
 
 I'm not sure questions about anything as ill-defined as consciousness
 are meaningful.

The question arises when we need to make moral decisions, such as is it moral
to upload a human brain into software, then manipulate that data in arbitrary
ways, e.g. simulate pain?

I think consciousness is poorly defined because any attempt to define it leads
to the conclusion that it does not exist.  You know what consciousness is, but
try to define it.

1. Consciousness is the little person in your head that observes everything
you sense and decides everything you do.

2. Consciousness (or self awareness) is what makes you different than everyone
else.

3. Consciousness is what makes the world today different than before you were
born.

4. If an exact copy of you was made, atom for atom, replicating all of your
memories and behavior, then the only distinction between you and your copy
would be that you have a consciousness.

But with any of these definitions, it becomes clear that there is no physical
justification for consciousness.  You believe that other people have
consciousnesses because you know that you do, and others are like you.  But
there is no way to know for sure.  How do you distinguish between a person who
has self awareness and one who only behaves as if he or she does?

Perhaps we can drop the insistence that consciousness exists.  Then a possible
definition would be any behavior consistent with a belief in self awareness or
free will.  But this has problems too.

  - Does a thermostat want to keep the room at a constant temperature, or
 does it only behave as if that is what it wants?  (Ask this question about
 human behavior).
 
 I don't understand your question. It depends on your definition
 of want.

I mean that if an agent has goal directed behavior, then it behaves as if it
wants to satisfy its goals.

I use this example to show that goal directed behavior is not a criteria for
consciousness.

Do animals have consciousness?  Does an embryo?  These questions are
controversial.  AGI will raise new controversies.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Project proposal: MindPixel 2

2007-01-26 Thread Matt Mahoney


--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote:

 On 1/25/07, Ben Goertzel [EMAIL PROTECTED] wrote:
  If there is a major problem with Cyc, it is not the choice of basic
  KR language.  Predicate logic is precise and relatively simple.
 
 I agree mostly, though I think even Cyc's simple predicate logic language
 can be made even simpler and better.  For example, Cyc uses the classical
 quantifiers #$forAll and #$exists.  In my version I don't use Frege-style
 quantifiers but I allow generalized modifiers like many, a few, in
 addition to all, exists.

IMHO the problem with Cyc is they tried to go directly to adult level
intelligence with no theory on how people learn.  This is why they are having
such difficulty adding a natural language interface.  Children learn semantics
first, then simple sentences, and then the elements of logic such as and,
or, not, all, some, etc.  Cyc went straight to adult level logic and
math, and now they can't add in the stuff that should have been learned as
children.  They should have built the language model first.

Another problem is that n-th order logic (even probabilistic) is not how
people think.  Logic does not model inductive reasoning, e.g. Kermit is a
frog.  Kermit is green.  Therefore frogs are green.  Where is the theory that
explains why people reason this way?

This is what happens when you ignore the cognitive side of AI.

  Rather, the main problem is the impracticality of encoding a decent
  percentage of the needed commonsense knowledge!
 
 
 Now I see why we disagree here.  You believe we should acquire all knowledge
 via experiential learning.  IMO we can do even better than the experiential
 route.  We can let the internet crowd enter the commonsense corpus for
 us.  This should be allow us to reach a functioning, usable AGI sooner.

How much knowledge you need depends on what problem you are trying to solve. 
Building an AGI to run a corporation is not the same as building a better spam
detector.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Enumeration of useful genetic biases for AGI

2007-02-13 Thread Matt Mahoney

I don't think there is a simple answer to this problem.  We observe very 
complex behavior in much simpler organisms that lack long term memory or the 
ability to learn.  For example, bees are born knowing how to fly, build hives, 
gather food, and communicate its location.

The complexity of inductive bias is bounded by the complexity of your DNA, 
about 6 x 10^9 bits.  This is probably too high by a few orders of magnitude, 
just as the number of synapses overestimates the complexity of AGI.  
Nevertheless, we risk repeating the error of GOFAI.  Early AI researchers were 
led astray by the successes of explicitly coding knowledge into toy systems.  
Now we know to use statistical and machine learning techniques, but we may 
still be led astray by oversimplified models of inductive bias.  Certain 
aspects of the cerebral cortex are highly uniform, which suggests a simple 
model.  But the rest of the brain has a complex structure that is poorly 
understood.

AGI might still be harder than we think.  It has happened before.
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, February 13, 2007 9:28:53 PM
Subject: [agi] Enumeration of useful genetic biases for AGI

Hi,

In a recent offlist email dialogue with an AI researcher, he made the
following suggestion regarding the inductive bias that DNA supplies
to the human brain to aid it in learning:

*
What is encoded in the DNA may include a starting ontology (as proposed,
with exasperating vaguess, by developmental psychologists, though much
more complex than anything they have thought of) but the more important
thing is an implicit set of constraints on ontologies that can be
discovered by systematic 'scientific' investigation. So it might not
work in an arbitrary universe, including some simulated universes,e.g.
'tileworld' universes.

One such constraint (as Kant pointed out in 1780) is the
assumption that everything physical happens in 3-D space and
time. Another is the requirement for causal determinism (for most
processes).

There may also be constraints on kinds of information-processing
entities that can be learnt about in the environment, e.g. other humans,
other animals, dead-ancestors, gods, spirits, computer games, 

The major, substantive, ontology extensions have to happen in (partially
ordered) stages, each stage building on previous stages, and brain
development is staggered accordingly.
**


My response to him was that these genetic biases are indeed encoded
in the Novamente design, but in a somewhat unsystematic and scattered way.


For instance, in the Novamente system,

-- the restriction to 3D space is implicit in the set of elementary 
predicates and procedures supplied
to the system for preprocessing perceptual data on its way to abstract 
cognition

-- the bias toward causal determinism is implicit in an inference 
control mechanism that specifically
tries to build PredictiveAttractionLink relationships that embody 
likely causal relationships

etc.

I have actually never gone through the design with an eye towards 
identifying exactly how each
important genetic bias of cognition is encoded in the system.  
However, this would be an interesting
and worthwhile thing to do.

Toward that end, it would be interesting to have a systematic list 
somewhere of the genetic biases
that are thought to be most important for structuring human cognition.

Does anyone know of a well-thought-out list of this sort.  Of course I 
could make one by surveying
the cognitive psych literature, but why reinvent the wheel?

-- Ben G

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

[agi] Re: Languages for AGI

2007-02-20 Thread Matt Mahoney

I think choosing an architecture for AGI is a much more important problem than 
choosing a language.  But there are some things we already know about AGI.  
First, AGI requires a vast amount of knowledge, and therefore a vast amount of 
computation.  Therefore, at least part of the AGI will have to be implemented 
in a fast (perhaps parallel) language.  Second, if you plan on having a team of 
programmers do the work (rather than all by yourself) then you will have to 
choose a widely known language.

Early work in AI used languages like Lisp or Prolog to directly express 
knowledge.  Now we all know (except at Cycorp) that this does not work.  There 
is too much knowledge to code directly.  You will need a learning algorithm and 
training and test data.

The minimum requirement for AGI is a language model, which requires about 10^9 
bits of information (based on estimates by Turing and Landauer, and the amount 
of language processed by adulthood).  When you add vision, speech, robotics, 
etc., it will be more.  We don't know how much, but if we use the human brain 
as a model, then one estimate is the number of synapses (about 10^13) 
multiplied by the access rate (10 Hz) = 10^14 operations per second.  But these 
numbers are really just guesses.  Perhaps they are high, but people have been 
working
on computational shortcuts for the last 50 years without success.

My work is in data compression, which I believe is an AI problem.  (You might 
disagree, but first see my argument at 
http://cs.fit.edu/~mmahoney/compression/rationale.html ).  Whether or not you 
agree, compression, like AGI, requires a great deal of memory and CPU.  Many of 
the top compressors ranked in my benchmark are open source, and of those, the 
top languages are C++ followed by C and assembler.  I don't know of any written 
in Java, C#, Python, or any interpreted languages, or any that use relational 
databases.

AGI is amenable to parallel computation.  Language, vision, speech, and 
robotics all involve combining thousands of soft constraints.  This requires 
vector operations.  The fastest way to do this on a PC is to use the parallel 
MMX and SSE2 instructions (or a GPU) that are not accessible in high level 
languages.  The 16-bit vector dot product that I implemented in MMX as part of 
the neural network used in the PAQ compressor is 6 times faster than optimized 
C.  Fortunately you do not need a lot of assembler, maybe a couple hundred 
lines of code to do most of the work.

AGI is still an area of research.  Not only do you need fast implementations so 
your experiments finish in reasonable time, but you will need to change your 
code many times.  Train, test, modify, repeat.  Your code has to be both 
optimized and structured so that it can be easily changed in ways you can't 
predict.  This is hard, but unfortunately we do not know yet what will work.  
 
-- Matt Mahoney, [EMAIL PROTECTED]


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] Do AGIs dream of electric sheep?

2007-02-25 Thread Matt Mahoney

I believe the purpose of sleep in placental and marsupial mammals (the only
animals with REM sleep) is to copy medium term (daily) memories from the
hippocampus to long term memory in the cortex.  In humans, only visual and
verbal memories are transferred (as dreams).  During deep sleep between
dreams, memories in the cortex are played back in reverse and fed back to the
hippocampus*, which I believe is the process of erasing medium term memories
as part of a feedback loop.

An AGI should have a hierarchy of short and long term memory, but I don't
believe it is necessary to mimic sleeping and dreaming.  I think there are
more efficient ways to implement a cache when you remove the limitations of
neurons.

*discussed on this list.  Sorry, I don't remember the reference.

--- Chuck Esterbrook [EMAIL PROTECTED] wrote:

 This is a light article about the purpose and value of sleep in humans:

http://www.dailymail.co.uk/pages/live/articles/technology/technology.html?in_article_id=437683in_page_id=1965
 
 The article is nothing earth shattering, but it reminded me that I've
 thought for a long time that an AGI would likely have a sleep cycle to
 perform various functions such as optimizing memory retrieval,
 learning new associations, solving problems, etc.
 
 What about the AGIs that people are building or working towards, such
 as those from Novamente, AdaptiveAI, Hall, etc.? Do/Will your systems
 have sleep periods for internal maintenance and improvement? If so,
 what types of activities do they perform during sleep?
 
 Or feel free to chime in with thoughts on AGI and sleep even if you
 haven't begun building yet...
 
 -Chuck
 


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: [agi] The Missing Piece

2007-03-01 Thread Matt Mahoney


--- Andrii (lOkadin) Zvorygin [EMAIL PROTECTED] wrote:

 Hmmm, if you could put on some basic rules on the randomness(in a database
 of Lojban that gives a random statement or series of statements), say to
 accept logical statements that could then be applied onto input.  So say you
 same something like le MLAtu cu GLEki (the cat is happy) and later make a
 statement le MLAtu and press return it could ask you cu GLEki gi'a mo (is
 happy or is what function?).
 
 If it was to be a chat bot, it could wait for a reply and if it believes no
 one is interested it could offer a random phrase as a topic such as le
 MLAtu cu GLEki.
 
 So maybe some can try approaching AI from the other way around? Instead of
 going bottom up of purely unambiguous code to restricted randomness of
 interaction. To go from pure randomness to restricted randomness of
 interaction.
 
 Does anyone know what would be a good language to do that in? I think I
 recall there being a programming language based on set theory that was all
 about streams.

What about English?  Irregular grammar is only a tiny part of the language
modeling problem.  Uaing an artificial language with a regular grammar to
simplify the problem is a false path.  If people actually used Logban then
it would be used in ways not intended by the developer and it would develop
all the warts of real languages.  The real problem is to understand how humans
learn language.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

1 2 3 4 5 6 7 8 >

1 - 100 of 777 matches

Mail list logo