Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Eliezer S. Yudkowsky
As long as we're talking about fantasy applications that require 
superhuman AGI, I'd be impressed by a lossy compression of Wikipedia 
that decompressed to a non-identical version carrying the same semantic 
information.


--
Eliezer S. Yudkowsky  http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread J. Andrew Rogers


On Aug 12, 2006, at 6:27 PM, Yan King Yin wrote:
I think compression is essential to intelligence, but the  
difference between lossy and lossless may make the algorithms quite  
different.



For general algorithms (e.g. ones that do not play to the sensory  
biases of humans) there should be little difference in theory.  The  
practical problem with a massive corpus like Wikipedia is that it is  
so large that resource constraints dictate compression strategies if  
it must be lossless.


I make my lossless representation system lossy with a mechanism that  
loses the least useful patterns (at the last possible moment) such  
that it always fits within some memory resource bound with a minimum  
impact on predictive performance.  In theory I could just disable the  
resource bounding function and fire up the Wikipedia data set, but it  
is so large that I would have to optimize the hell out of the code to  
have a prayer of the problem fitting on modest hardware.  With the  
constraints given and the size of the data, it seems that it  
encourages algorithms that are fitted to the problem rather than  
algorithms that are general in a fashion useful to AI.


Reducing the data set an order of magnitude or so might make a more  
practical case for the intended purpose in that it would make more  
creative modeling algorithms plausible.



J. Andrew Rogers

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Russell Wallace
On 8/13/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
There
is no knowledge that you can demonstrate verbally that cannot also be
learned verbally.
An unusual claim... do you mean all knowledge can be learned verbally,
or do you think there are some kinds of knowledge that cannot be
demonstrated verbally?


To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
Hutter's only assumption about AIXI is that the environment can be simulated by 
a Turing machine.

With regard to forgetting, I think it plays a minor role in language modeling 
compared to vision and hearing.  To model those, you need to understand what 
the brain filters out.  Lossy compression formats like JPEG and MP3 exploit 
this by discarding what cannot be seen or heard.  However, text doesn't work 
this way.  How much can you discard from a text file before it differs 
noticeably?
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Pei Wang <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Saturday, August 12, 2006 8:53:40 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

Matt,

So you mean we should leave forgetting out of the picture, just
because we don't know how to objectively measure it.

Though objectiveness is indeed desired for almost all measurements, it
is not the only requirement for a good measurement of intelligence.
Someone can objectively measure a wrong property of a system.

I haven't been convinced why "lossless compression" can be taken as an
indicator of intelligence, except that it is objective and easy to
check. You wrote in your website that "Hutter [21,22] proved that
finding the optimal behavior of a rational agent is equivalent to
compressing its observations.", but his proof is under certain
assumptions about the agent and its environment. Do these assumptions
hold for the human mind or AGI in general?

Pei


On 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> "Forgetting" is an important function in human intelligence because the 
> storage capacity of the brain is finite.  This is a form of lossy 
> compression, discarding the least important information.  Unfortunately, 
> lossy compression cannot be evaluated objectively.  We can compare an image 
> compressed with JPEG with an equal sized image compressed by discarding the 
> low order bits of each pixel, and judge the JPEG image to be of higher 
> quality.  JPEG uses a better model of the human visual system by discarding 
> the same information that the human visual perception process does.  It is 
> more intelligent.  Lossy image compression is a valid but subjective 
> evaluation of models of human vision.  There is no objective algorithm to 
> test for image quality.  It has to be done by humans.
>
> A lossless image compression contest would not measure intelligence because 
> you are modeling the physics of light and matter, not something that comes 
> from humans.  Also, the vast majority of information in a raw image is 
> useless noise, which is not compressible.  A good model of the compressible 
> parts would have only a small effect.  It is better to discard the noise.
>
> We are a long way from understading vision.  Standing in 1973 measured 
> subjects ability to memorize 10,000 pictures, viewed for 5 seconds each, then 
> 2 days later in a recall test showed pictures and asked if they were in the 
> earlier set, which they did correctly much of the time [1].  You could 
> achieve the same result if you compressed each picture to about 30 bits and 
> compared Hamming distances.  This is a long term learning rate of 6 bits per 
> second for images, or 2 x 10^9 bits over a lifetime, assuming we don't forget 
> anything after 2 days.  Likewise, Landauer [2] estimated human long term 
> memory at 10^9 bits based on rates of learning and forgetting.  It is also 
> about how much information you can absorb as speech or writing in a lifetime 
> assuming 150 words per minute at 1 bpc entropy.  It seems that the long term 
> learning rate of the brain is independent of the medium.  This is why I chose 
> 1 GB of text for the benchmark.
>
> Text compression measures intelligence because it models information that 
> comes from the human brain, not an external source.  Also, there is very 
> little noise in text.  If a paragraph can be rephrased in 1000 different ways 
> without changing its meaning, it only adds 10 more bits to code which 
> representation was chosen.  That is why lossless compression makes sense.
>
> [1] Standing, L. (1973), "Learning 10,000 Pictures", Quarterly Journal of 
> Experimental Psychology (25) pp. 207-222.
>
> [2] Landauer, Tom (1986), "How much do people remember?  Some estimates of 
> the quantity of learned information in long term memory", Cognitive Science 
> (10) pp. 477-493.
>
>  -- Matt Mahoney, [EMAIL PROTECTED]
>
> - Original Message 
> From: Pei Wang <[EMAIL PROTECTED]>
> To: agi@v2.listbox.com
> Sent: Saturday, August 12, 2006 4:03:55 PM
> Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge 
> prize
>
> Matt,
>
> To summarize and generalize data and to use the summary to predict the
> future is no doubt at the core of intelligence. However, I do not call
> this process "compressing", because the result is not faultless, that
> is, there is information loss.
>
> It is not only because

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
Most high end compressors are actually implemented using an explicit prediction model.  That includes any compressor using arithmetic coding, which is all of the compressors listed here as CM or PPM: http://cs.fit.edu/~mmahoney/compression/text.htmlThere is no knowledge that you can demonstrate verbally that cannot also be learned verbally.  Your problem with the glass spheres can be answered by knowing the physics of light refraction and the distance to the paper.  It might help to have vision to solve it, but it is not necessary, just like it is helpful but not necessary to draw diagrams to prove theorems in geometry [1].  Besides, most people would probably not know the answer
 anyway.I do not claim that the Wikipedia benchmark has the right knowledge for a compressor to solve your glass sphere problem, or many other problems.  I do believe the 1 GB set contains enough information to write and recognize novel, grammatically and semantically correct sentences and coherent paragraphs in the style of Wikipedia articles.[1] Gelernter, H., Realization of a
Geometry-Theorem Proving Machine, Proceedings
of an International Conference on Information Processing, Paris: UNESCO
House, pp. 273-282, 1959.  -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Russell Wallace <[EMAIL PROTECTED]>To: agi@v2.listbox.comSent: Saturday, August 12, 2006 8:44:37 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeOn 8/13/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
Whether
or not a compressor implements a model as a predictor or not is
irrelevant.  Modeling the entire input at once is mathematically
equivalent to predicting successive symbols.  Even if you think
you are not modeling, you are.  If you design a code so that s is
coded in n bits, you are implicitly assigning p(s) = 2^-n.  Any
compression or decompression algorithm can be expressed in terms of
prediction.
You can take the view that there is this implicit mathematical
equivalence if you wish, but that doesn't change the fact that typical
compression programs don't actually predict anything.

Also, Turing would disagree with your definition of AI.  The Turing test does not require vision or the ability to draw.
The Turing test is known to be nowhere near as sound as was believed in
Turing's day; we now know that human tendency to anthropomorphize is
strong enough that Elizas have been taken for human. Basically,
language is an ultra low bandwidth medium, so much so that an awful lot
of assumption has to be done to make it work. Adding visual elements
would make things much faster and more accurate because you're not
desperately trying to strain meaning from very small quantities of data.

But while gratuitously difficult, it is possible - so yes, passing a
properly administered Turing test _does_ require vision and the ability
to draw. You'd want to pose questions like... let's see...

"Consider a 3 inch solid sphere of red glass. Embedded at the center is
a 1 inch solid sphere of blue glass. Shine a white light through it
onto a white sheet of paper. What appears on the paper?"

Basically you need to ask the sort of questions that a blind, paralyzed
human would need a still-functioning visual cortex to answer.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Yan King Yin
 
I think compression is essential to intelligence, but the difference between lossy and lossless may make the algorithms quite different.
 
But why not let competitors compress lossily?  As far as prediction goes, the testing part is still the same!
 
If you guys have a lossy version of the prize I will definitely join =)
 
YKY

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

Hi,


> But Shane, your 19 year old self had a much larger and more diverse
> volume of data to go on than just the text or speech that you
> ingested...


I would claim that a blind and deaf person at 19 could pass a
Turing test if they had been exposed to enough information over
the years.  Especially if they had the ability to read everything
that was ever spoken to them.  So I don't see why you would
need a corpus billions of times larger than this, as you suggested.


I don't think a blind and deaf person could pass an "imitate a
nonblind and nondeaf person" Turing test

> > And, of course, your ability to predict your next verbal response is

> NOT a good indicator of your ability to adaptively deal with new
> situations...


All I'm talking about is predicting well enough to pass a Turing
test... that was my claim:  That with an amazingly good compressor
my life's spoken and written words you could construct a machine
that would pass a Turing test.


I suppose you're right, but it depends on the quality of the Turing Test.

I suppose a clever enough Turing test could be constructed that would
catch this imitator, but I agree that it could pass the ordinary
Turing test (which isn't aimed at presenting an AI with unpredictable
situations requiring intelligent-human-like creativity)

So, I mostly concede your point...


But if it can't make good predictions to random questions given to
me in a Turing test, then it's not an "amazingly good compressor"
of the first 20 years of my life.  Indeed the first 20 years of my life
would involve tens of thousands of conversations, and I presume on
all of them my responses would have been good enough to pass a
Turing test.


This all depends on the degree to which the compressor is "overfit" to
your particular history, as opposed to abstracting from it the generic
patterns that make you *you* ...

-- Ben

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] FYI: The Human Speechome Project

2006-08-12 Thread Gabriel Recchia
I got a chance to meet Deb Roy at CogSci two weeks ago and saw his poster on this project--quite impressive.  Very exciting as a potential solution to the knowledge acquisition bottleneck.  Not *all* external stimuli relevant to language learning is captured--for example, Deb said that the video isn't fine-grained enough to accurately assess eye gaze.  But it's by far the most promising effort I've heard of to build a computational model of language grounded in real sensory data.
Datasets like this, which connect natural language to a low-level representation of the domain that it describes over long periods of time, could be very useful training sets for AGI research.  The Speechome Project is obviously one example, though it's restricted for privacy reasons.  But other hypothetical examples could include:
- recordings of humans interacting in a virtual environment such as AGI-SIM over several months, speaking only about things directly pertaining to that environment- video of hundreds of online chess games, paired move-by-move with text files that describe the spatial relationships between the pieces over time in a variety of different ways ("the rook came out from behind the pawn", "the queen just moved between White's two bishops").  One would not expect a learning algorithm trained on this data to be able to talk about chess in general, but one would hope it could learn to express complex spatial relationships between the pieces in natural language.
- a recording of virtually any natural or sub-symbolic artificial environment over a very long period of time, paired with lots of natural language restricted to that domain.  (It seems to me that even symbolic artificial environments could possibly work, if the symbols expressed fine-grained enough concepts, but I'll leave that debate aside for the moment.)
Theoretically, the appropriate learning algorithm, trained on such a dataset, could learn how to talk about that domain in natural language--a good indication that the algorithm works and a cool proof-of-concept to show to others.  
The specific examples above were just off the top of my head; I wouldn't be surprised if they have shortcomings that would make them suboptimal training sets.  Can anyone comment on what properties a training set like this would need to have in order to be really useful?  If something like this existed and was made freely available, would people be interested in using it?
On 8/12/06, Pei Wang <[EMAIL PROTECTED]> wrote:
See the paper athttp://www.cogsci.rpi.edu/CSJarchive/Proceedings/2006/docs/p2059.pdfABSTRACT:The Human Speechome Project is an effort to observe
and computationally model the longitudinal course oflanguage development of a single child at an unprecedentedscale. The idea is this: Instrument a child'shome so that nearly everything the child hears and sees
from birth to three is recorded. Develop a computationalmodel of language learning that takes the child'saudio-visual experiential record as input. Evaluate themodel's performance in matching the child's linguistic
abilities as a means of assessing possible learning strategiesused by children in natural contexts. First stepsof a pilot effort along these lines are described includingissues of privacy management and methods for overcoming
limitations of fully-automated machine perception.---To unsubscribe, change your address, or temporarily deactivate your subscription,please go to 
http://v2.listbox.com/member/[EMAIL PROTECTED]

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Pei Wang

Matt,

So you mean we should leave forgetting out of the picture, just
because we don't know how to objectively measure it.

Though objectiveness is indeed desired for almost all measurements, it
is not the only requirement for a good measurement of intelligence.
Someone can objectively measure a wrong property of a system.

I haven't been convinced why "lossless compression" can be taken as an
indicator of intelligence, except that it is objective and easy to
check. You wrote in your website that "Hutter [21,22] proved that
finding the optimal behavior of a rational agent is equivalent to
compressing its observations.", but his proof is under certain
assumptions about the agent and its environment. Do these assumptions
hold for the human mind or AGI in general?

Pei


On 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:

"Forgetting" is an important function in human intelligence because the storage 
capacity of the brain is finite.  This is a form of lossy compression, discarding the 
least important information.  Unfortunately, lossy compression cannot be evaluated 
objectively.  We can compare an image compressed with JPEG with an equal sized image 
compressed by discarding the low order bits of each pixel, and judge the JPEG image to be 
of higher quality.  JPEG uses a better model of the human visual system by discarding the 
same information that the human visual perception process does.  It is more intelligent.  
Lossy image compression is a valid but subjective evaluation of models of human vision.  
There is no objective algorithm to test for image quality.  It has to be done by humans.

A lossless image compression contest would not measure intelligence because you 
are modeling the physics of light and matter, not something that comes from 
humans.  Also, the vast majority of information in a raw image is useless 
noise, which is not compressible.  A good model of the compressible parts would 
have only a small effect.  It is better to discard the noise.

We are a long way from understading vision.  Standing in 1973 measured subjects 
ability to memorize 10,000 pictures, viewed for 5 seconds each, then 2 days 
later in a recall test showed pictures and asked if they were in the earlier 
set, which they did correctly much of the time [1].  You could achieve the same 
result if you compressed each picture to about 30 bits and compared Hamming 
distances.  This is a long term learning rate of 6 bits per second for images, 
or 2 x 10^9 bits over a lifetime, assuming we don't forget anything after 2 
days.  Likewise, Landauer [2] estimated human long term memory at 10^9 bits 
based on rates of learning and forgetting.  It is also about how much 
information you can absorb as speech or writing in a lifetime assuming 150 
words per minute at 1 bpc entropy.  It seems that the long term learning rate 
of the brain is independent of the medium.  This is why I chose 1 GB of text 
for the benchmark.

Text compression measures intelligence because it models information that comes 
from the human brain, not an external source.  Also, there is very little noise 
in text.  If a paragraph can be rephrased in 1000 different ways without 
changing its meaning, it only adds 10 more bits to code which representation 
was chosen.  That is why lossless compression makes sense.

[1] Standing, L. (1973), "Learning 10,000 Pictures", Quarterly Journal of 
Experimental Psychology (25) pp. 207-222.

[2] Landauer, Tom (1986), "How much do people remember?  Some estimates of the 
quantity of learned information in long term memory", Cognitive Science (10) pp. 
477-493.

 -- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Pei Wang <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Saturday, August 12, 2006 4:03:55 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

Matt,

To summarize and generalize data and to use the summary to predict the
future is no doubt at the core of intelligence. However, I do not call
this process "compressing", because the result is not faultless, that
is, there is information loss.

It is not only because the human brains are "noisy analog devices",
but because the future is different from the past, and the mind works
under resources restriction. Only when certain information is
temporally or permanently ignored (forgotten) can the system
efficiently uses its knowledge.

For this reason, I'd make a conjunction that is opposite Hutter's: A
necessary condition for a system to be intelligent is that it can
forget.

Of course it is not a sufficient condition for a system to be intelligent. ;-)

Pei





---
To unsubscribe, change your address, or temporarily deactivate your 
subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Russell Wallace
On 8/13/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
Whether
or not a compressor implements a model as a predictor or not is
irrelevant.  Modeling the entire input at once is mathematically
equivalent to predicting successive symbols.  Even if you think
you are not modeling, you are.  If you design a code so that s is
coded in n bits, you are implicitly assigning p(s) = 2^-n.  Any
compression or decompression algorithm can be expressed in terms of
prediction.
You can take the view that there is this implicit mathematical
equivalence if you wish, but that doesn't change the fact that typical
compression programs don't actually predict anything.

Also, Turing would disagree with your definition of AI.  The Turing test does not require vision or the ability to draw.
The Turing test is known to be nowhere near as sound as was believed in
Turing's day; we now know that human tendency to anthropomorphize is
strong enough that Elizas have been taken for human. Basically,
language is an ultra low bandwidth medium, so much so that an awful lot
of assumption has to be done to make it work. Adding visual elements
would make things much faster and more accurate because you're not
desperately trying to strain meaning from very small quantities of data.

But while gratuitously difficult, it is possible - so yes, passing a
properly administered Turing test _does_ require vision and the ability
to draw. You'd want to pose questions like... let's see...

"Consider a 3 inch solid sphere of red glass. Embedded at the center is
a 1 inch solid sphere of blue glass. Shine a white light through it
onto a white sheet of paper. What appears on the paper?"

Basically you need to ask the sort of questions that a blind, paralyzed
human would need a still-functioning visual cortex to answer.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Shane Legg
But Shane, your 19 year old self had a much larger and more diversevolume of data to go on than just the text or speech that you
ingested...I would claim that a blind and deaf person at 19 could pass aTuring test if they had been exposed to enough information overthe years.  Especially if they had the ability to read everything
that was ever spoken to them.  So I don't see why you wouldneed a corpus billions of times larger than this, as you suggested. 
And, of course, your ability to predict your next verbal response isNOT a good indicator of your ability to adaptively deal with newsituations...All I'm talking about is predicting well enough to pass a Turing
test... that was my claim:  That with an amazingly good compressormy life's spoken and written words you could construct a machinethat would pass a Turing test. 
I do not assume that an outstanding compressor of your verbal inputsand outputs would necessarily be a great predictor of your futureverbal inputs and outputs -- because there is much more to you thanverbalizations.  It might make bad errors in predicting your responses
in situations different from ones you had previously experienced... orin situations similar to situations you had previously experienced butthat did not heavily involve verbiage...But if it can't make good predictions to random questions given to
me in a Turing test, then it's not an "amazingly good compressor"of the first 20 years of my life.  Indeed the first 20 years of my lifewould involve tens of thousands of conversations, and I presume on
all of them my responses would have been good enough to pass aTuring test.Shane

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
Whether or not a compressor implements a model as a predictor or not is irrelevant.  Modeling the entire input at once is mathematically equivalent to predicting successive symbols.  Even if you think you are not modeling, you are.  If you design a code so that s is coded in n bits, you are implicitly assigning p(s) = 2^-n.  Any compression or decompression algorithm can be expressed in terms of prediction.Also, Turing would disagree with your definition of AI.  The Turing test does not require vision or the ability to draw. -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From:
 Russell Wallace <[EMAIL PROTECTED]>To: agi@v2.listbox.comSent: Saturday, August 12, 2006 7:09:50 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeOn 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
First,
the compression problem is not in NP.  The general problem of
encoding strings as the smallest programs to output them is undecidable.
But as I said, it becomes NP when there's an upper limit to decompression time.

Second,
given a model, then compression is the same as prediction.  A
model is a function that maps any string s to an estimated probability
p(s).
However in this case we are not given such a function, nor is there any
need to create one - lots of compression algorithms work fine without
them.

Third,
given a fixed test set, it would be trivial to write a decompressor
that memorized it verbatim and compress to 0 bytes if we did not
include the size of the decompressor in the contest.
Of course - that's why I said there would be a problem with allowing decompression to rely on a local knowledge base.

It is easy to dismiss
compression as unrelated to AGI.
Yes, it was quite easy :)

How do you test if a machine
with only text I/O knows that roses are red?
Very easily...

Suppose it sees "red
roses", then later "roses are" and predicts "red".
Sure, but that only shows it knows that the string "roses are" is
followed by the string "red", not that it knows roses are red. To test
the latter requires showing it pictures of flowers of different colors
and seeing whether it can use the red color as a cue to pick out the
roses, or telling it to draw a picture of a rose and seeing if it fills
in the color correctly.

A machine with only text I/O of course automatically fails, so you know
even without running it that it cannot know that roses are red. See,
told you the testing process would be easy :)

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
"Forgetting" is an important function in human intelligence because the storage 
capacity of the brain is finite.  This is a form of lossy compression, 
discarding the least important information.  Unfortunately, lossy compression 
cannot be evaluated objectively.  We can compare an image compressed with JPEG 
with an equal sized image compressed by discarding the low order bits of each 
pixel, and judge the JPEG image to be of higher quality.  JPEG uses a better 
model of the human visual system by discarding the same information that the 
human visual perception process does.  It is more intelligent.  Lossy image 
compression is a valid but subjective evaluation of models of human vision.  
There is no objective algorithm to test for image quality.  It has to be done 
by humans.

A lossless image compression contest would not measure intelligence because you 
are modeling the physics of light and matter, not something that comes from 
humans.  Also, the vast majority of information in a raw image is useless 
noise, which is not compressible.  A good model of the compressible parts would 
have only a small effect.  It is better to discard the noise.

We are a long way from understading vision.  Standing in 1973 measured subjects 
ability to memorize 10,000 pictures, viewed for 5 seconds each, then 2 days 
later in a recall test showed pictures and asked if they were in the earlier 
set, which they did correctly much of the time [1].  You could achieve the same 
result if you compressed each picture to about 30 bits and compared Hamming 
distances.  This is a long term learning rate of 6 bits per second for images, 
or 2 x 10^9 bits over a lifetime, assuming we don't forget anything after 2 
days.  Likewise, Landauer [2] estimated human long term memory at 10^9 bits 
based on rates of learning and forgetting.  It is also about how much 
information you can absorb as speech or writing in a lifetime assuming 150 
words per minute at 1 bpc entropy.  It seems that the long term learning rate 
of the brain is independent of the medium.  This is why I chose 1 GB of text 
for the benchmark.

Text compression measures intelligence because it models information that comes 
from the human brain, not an external source.  Also, there is very little noise 
in text.  If a paragraph can be rephrased in 1000 different ways without 
changing its meaning, it only adds 10 more bits to code which representation 
was chosen.  That is why lossless compression makes sense.
 
[1] Standing, L. (1973), “Learning 10,000 Pictures”, Quarterly Journal of 
Experimental Psychology (25) pp. 207-222.  

[2] Landauer, Tom (1986), “How much do people remember?  Some estimates of the 
quantity of learned information in long term memory”, Cognitive Science (10) 
pp. 477-493.
 
 -- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Pei Wang <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Saturday, August 12, 2006 4:03:55 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

Matt,

To summarize and generalize data and to use the summary to predict the
future is no doubt at the core of intelligence. However, I do not call
this process "compressing", because the result is not faultless, that
is, there is information loss.

It is not only because the human brains are "noisy analog devices",
but because the future is different from the past, and the mind works
under resources restriction. Only when certain information is
temporally or permanently ignored (forgotten) can the system
efficiently uses its knowledge.

For this reason, I'd make a conjunction that is opposite Hutter's: A
necessary condition for a system to be intelligent is that it can
forget.

Of course it is not a sufficient condition for a system to be intelligent. ;-)

Pei





---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Russell Wallace
On 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
First,
the compression problem is not in NP.  The general problem of
encoding strings as the smallest programs to output them is undecidable.
But as I said, it becomes NP when there's an upper limit to decompression time.

Second,
given a model, then compression is the same as prediction.  A
model is a function that maps any string s to an estimated probability
p(s).
However in this case we are not given such a function, nor is there any
need to create one - lots of compression algorithms work fine without
them.

Third,
given a fixed test set, it would be trivial to write a decompressor
that memorized it verbatim and compress to 0 bytes if we did not
include the size of the decompressor in the contest.
Of course - that's why I said there would be a problem with allowing decompression to rely on a local knowledge base.

It is easy to dismiss
compression as unrelated to AGI.
Yes, it was quite easy :)

How do you test if a machine
with only text I/O knows that roses are red?
Very easily...

Suppose it sees "red
roses", then later "roses are" and predicts "red".
Sure, but that only shows it knows that the string "roses are" is
followed by the string "red", not that it knows roses are red. To test
the latter requires showing it pictures of flowers of different colors
and seeing whether it can use the red color as a cue to pick out the
roses, or telling it to draw a picture of a rose and seeing if it fills
in the color correctly.

A machine with only text I/O of course automatically fails, so you know
even without running it that it cannot know that roses are red. See,
told you the testing process would be easy :)

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
First, the compression problem is not in NP.  The general problem of encoding strings as the smallest programs to output them is undecidable.Second, given a model, then compression is the same as prediction.  A model is a function that maps any string s to an estimated probability p(s).  A compressor then maps s to a code of length log(1/p(s)).  The decompressor does the inverse mapping.  The compressor and decompressor only need to agree on the model p(), and can then use identical algorithms to assign a mapping.  (This step is deterministic, so not possible by humans).  Modeling is the same as prediction because p(s) = PROD_i p(s_i | s_1 s_2 ... s_i-1)which is the product of conditional probabilities over the next symbol s_i given all of the previous
 symbols s_1 through s_i-1 in s.Third, given a fixed test set, it would be trivial to write a decompressor that memorized it verbatim and compress to 0 bytes if we did not include the size of the decompressor in the contest.  Instead you have to start with a small amount of knowledge coded into the decompressor and learn the rest from the data itself.  This is a test of language learning ability.It is easy to dismiss compression as unrelated to AGI.  How do you test if a machine with only text I/O knows that roses are red?  Suppose it sees "red roses", then later "roses are" and predicts "red".  An LSA or distant-bigram model will do this. -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Russell Wallace
 <[EMAIL PROTECTED]>To: agi@v2.listbox.comSent: Saturday, August 12, 2006 5:30:21 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeOn 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
In order to compress text well, the compressor must be able to estimate probabilities over text strings, i.e. predict text.
Um no, the compressor doesn't need to predict anything - it has the entire file already at hand.

The _de_compressor would benefit from being able to predict, e.g.
"roses are red" - the third word need not be sent if the decompressor
can be relied on to know what it will be given the first two.

However, this is not permitted by the terms of the prize: the
compressed file cannot depend on a knowledge base at the receiving end;
it must run on a bare PC. Therefore the challenge is a purely
mathematical one (in class NP, given a limit on decompression time),
and not related to AGI.

Even if the terms of the prize did allow a knowledge base at the
receiving end (which would be problematic for a compression benchmark;
it would be very difficult to make the test objective), it still
wouldn't really be related to AGI. A good decompressor would know that
the words "roses are" tend to be followed by the word "red" - but it
would not know that the three words in sequence mean that roses are red.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
Your idea of human-assisted compression is something I was considering to use to measure the entropy of the Hutter prize dataset.  This is similar to what Shannon did in 1950 when he used text prediction to estimate the entropy of written English.  He allowed subjects to use n-gram and word frequency tables to help them.  I believe this is fair.I know about Cover and King's attempt to measure entropy in 1978 using a gambling game, and Tan's measurement of Malay text using the same method.  This was an attempt to have people explicitly assign probabilities on the next letter by having a group of people place bets.  They obtained a figure of 1.3 to 1.7 bpc for individuals and 1.3 combined for a single sentence from the same book (Jefferson the Virginian) that Shannon used in one of his
 tests.  One sentence is not a very big sample.Another problem with this test is it is very tedious.  It took 5 hours to measure that sentence.Also, people do not always bet rationally.  People overestimate the probability of rare events, which is why we have lotteries and insurance.  This effect would result in an artificially high measurement.Are you aware of any more recent work in this area? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Shane Legg <[EMAIL PROTECTED]>To: agi@v2.listbox.comSent: Saturday, August 12, 2006 3:48:27 PMSubject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prizeThat seems clear.Human-level AGI =/=> Good Hutter test result

just asHuman =/=> Good Hutter test resultMy suggestion then is to very slightly modify the test as follows:  Instead of just getting the raw characters, what you get is thesequence of characters and the probability distribution over the
next character as predicted by a standard compressor.  You(meaning the algorithm or person being tested) can then chooseto modify this distribution before it is used for compression.So, for example, when the compressor is extremely certain that
the next characters are "
However, I am uncertain whether
Amazingly outstanding Hutter test result ==> powerful AGIAt least I think you'll agree that an amazingly outstanding Huttertest result (possibly on an even larger text corpus that included
conversations etc.) would allow you to then construct a machinethat would pass the Turing test?Shane

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Russell Wallace
On 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
In order to compress text well, the compressor must be able to estimate probabilities over text strings, i.e. predict text.
Um no, the compressor doesn't need to predict anything - it has the entire file already at hand.

The _de_compressor would benefit from being able to predict, e.g.
"roses are red" - the third word need not be sent if the decompressor
can be relied on to know what it will be given the first two.

However, this is not permitted by the terms of the prize: the
compressed file cannot depend on a knowledge base at the receiving end;
it must run on a bare PC. Therefore the challenge is a purely
mathematical one (in class NP, given a limit on decompression time),
and not related to AGI.

Even if the terms of the prize did allow a knowledge base at the
receiving end (which would be problematic for a compression benchmark;
it would be very difficult to make the test objective), it still
wouldn't really be related to AGI. A good decompressor would know that
the words "roses are" tend to be followed by the word "red" - but it
would not know that the three words in sequence mean that roses are red.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

> Yes, I think a hybridized AGI and compression algorithm could do
> better than either one on its own  However, this might result in
> an incredibly slow compression process, depending on how fast the AGI
> thinks.  (It would take ME a long time to carry out this process over
> the whole Hutter corpus...)


Estimate the average compression by sampling.


That is certainly a better approach, and moving toward something
sensible and viable...

But it is **not** the Hutter Contest...

I still think that this is not a good approach for benchmarking
early-stage AGI's, which may not be verbal at all in their focus (just
as young children are not).

But for AGI's that are advanced enough to have fluent language
understanding, I think that ability to compress random samples of a
corpus in collaboration with powerful narrow-AI compression algorithms
**might** be (with suitable care) turned into a viable sort of
intelligence test...

-- Ben

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

I don't think it's anywhere near that much.  I read at about 2 KB
per minute, and I listen to speech (if written down as plain text)
at a roughly similar speed.  If you then work it out, buy the time
I was 20 I'd read/heard not more than 2 or 3 GB of raw text.
If you could compress/predict everything that I'd read or heard
until I was 20 years old *amazingly well*, then I'm sure you'd
be able to use this predictive model to easily pass a Turing test.

Indeed it's trivially true: Just have me sit a Turing test when I
was 19.  Naturally I would have passed it, and thus so would
the compressor/predictor (assuming that it's amazingly good,
or at least as good at predicting my responses as I would be).

Shane


But Shane, your 19 year old self had a much larger and more diverse
volume of data to go on than just the text or speech that you
ingested...

And, of course, your ability to predict your next verbal response is
NOT a good indicator of your ability to adaptively deal with new
situations...

I don't buy your "trivial proof" at all...

I do not assume that an outstanding compressor of your verbal inputs
and outputs would necessarily be a great predictor of your future
verbal inputs and outputs -- because there is much more to you than
verbalizations.  It might make bad errors in predicting your responses
in situations different from ones you had previously experienced... or
in situations similar to situations you had previously experienced but
that did not heavily involve verbiage...

-- Ben

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Shane Legg
Yes, I think a hybridized AGI and compression algorithm could dobetter than either one on its own  However, this might result in
an incredibly slow compression process, depending on how fast the AGIthinks.  (It would take ME a long time to carry out this process overthe whole Hutter corpus...)Estimate the average compression by sampling.
 Also, not all narrow-AI compression algorithms will necessarily beable to produce output in the style you describe above.  Standard LZ
Sure, use a PPM compressor perhaps. > At least I think you'll agree that an amazingly outstanding Hutter
> test result (possibly on an even larger text corpus that included> conversations etc.) would allow you to then construct a machine> that would pass the Turing test?I agree ONLY in the context of a vastly larger text corpus --- and I
wonder just how large a corpus would be required ... quite possibly,one much larger than all text ever produced in the history of thehuman race...I don't think it's anywhere near that much.  I read at about 2 KB
per minute, and I listen to speech (if written down as plain text)at a roughly similar speed.  If you then work it out, buy the timeI was 20 I'd read/heard not more than 2 or 3 GB of raw text.If you could compress/predict everything that I'd read or heard
until I was 20 years old *amazingly well*, then I'm sure you'dbe able to use this predictive model to easily pass a Turing test.Indeed it's trivially true: Just have me sit a Turing test when Iwas 19.  Naturally I would have passed it, and thus so would
the compressor/predictor (assuming that it's amazingly good,or at least as good at predicting my responses as I would be).Shane

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Pei Wang

Matt,

To summarize and generalize data and to use the summary to predict the
future is no doubt at the core of intelligence. However, I do not call
this process "compressing", because the result is not faultless, that
is, there is information loss.

It is not only because the human brains are "noisy analog devices",
but because the future is different from the past, and the mind works
under resources restriction. Only when certain information is
temporally or permanently ignored (forgotten) can the system
efficiently uses its knowledge.

For this reason, I'd make a conjunction that is opposite Hutter's: A
necessary condition for a system to be intelligent is that it can
forget.

Of course it is not a sufficient condition for a system to be intelligent. ;-)

Pei


On 8/12/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:

A common objection to compression as a test for AI is that humans can't do 
compression, so it has nothing to do with AI.  The reason people can't compress 
is that compression requires both AI and deterministic computation.  The human 
brain is not deterministic because it is made of neurons, which are noisy 
analog devices.

In order to compress text well, the compressor must be able to estimate probabilities over text 
strings, i.e. predict text.  If you take a text fragment from a book or article and ask someone to 
guess the next character, most people could do so more accurately than any program now in 
existence.  This is clearly an AI problem.  It takes intelligence to predict text like 
"6+8=__" or "roses are ___".  If a data compressor had such knowledge, then it 
could assign the shortest codes to the most likely answers.  Specifically, if the next symbol has 
probability p, it is assigned a code of length log2(1/p) bits.

Such knowledge is useless for human compression because the decompressor must 
have the exact same knowledge to generate the same codes.  This requires 
deterministic computation, which is no problem for a machine.  Therefore I 
believe that compression is a valid test for AI, and desirable because it is 
totally objective, unlike the Loebner prize.

Some known problems with the test:

- To pass the Turing test, a machine must have a model of interactive 
conversation.  The Wikipedia training data lacks examples of dialogs.  My 
argument is that noninteractive text is appropriate for tasks such as OCR, 
language translation, broadcast speech recognition, which is more useful than a 
machine that deliberatly makes mistakes and slows its responses to appear 
human.  I think that the problems of learning interactive and noninteractive 
models have a lot of overlap.

- A language model is insufficient for problems with nontextual I/O such as 
vision or robotics that require symbols to be grounded.  True, but a language 
model should be a part of such systems.

- We do not know how much compression is equivalent to AI.  Shannon gave a very 
rough estimate of 0.6 to 1.3 bits per character entropy for written English in 
1950.  There has not been much progress since then in getting better numbers.  
The best compressors are near the high end of this range.  (I did some research 
in 2000 to try to pin down a better number.  
http://cs.fit.edu/~mmahoney/dissertation/entropy1.html )

- It has not been shown that AI can learn from just a lot of text.  I believe 
that lexical, semantic, and syntactic models can be learned from unlabeled 
text.  Children seem to do this.  I doubt that higher level problem solving 
abilities can be learned without some coaching from the programmer, but this is 
allowed.

- The WIkipedia text has a lot of nontext like hypertext links, tables, foreign 
words, XML, etc.  True, but it is 75% text, so a better compressor still needs 
to compress text better.

Most of these issues were brought up in other newsgroups.
http://groups.google.com/group/comp.ai.nat-lang/browse_frm/thread/9411183ccde5f7a1/#
http://groups.google.com/group/comp.compression/browse_frm/thread/3f096aea993273cb/#
http://groups.google.com/group/Hutter-Prize?lnk=li

I also discuss them here.
http://cs.fit.edu/~mmahoney/compression/rationale.html

-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Cc: Bruce J. Klein <[EMAIL PROTECTED]>
Sent: Saturday, August 12, 2006 12:28:30 PM
Subject: [agi] Marcus Hutter's lossless compression of human knowledge prize

Hi,

About the "Hutter Prize" (see the end of this email for a quote of the
post I'm responding to, which was posted a week or two ago)...

While I have the utmost respect for Marcus Hutter's theoretical work
on AGI, and I do think this prize is an interesting one, I also want
to state that I don't think questing to win the Hutter Prize is a very
effective path to follow if one's goal is to create pragmatic AGI.

Look at it this way: WinZip could compress that dataset further than
*I* could, given a brief period of time to do it.  Yet, who is more
generally 

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

Hi,


My suggestion then is to very slightly modify the test as follows:

Instead of just getting the raw characters, what you get is the
sequence of characters and the probability distribution over the
next character as predicted by a standard compressor.  You
(meaning the algorithm or person being tested) can then choose
to modify this distribution before it is used for compression.

So, for example, when the compressor is extremely certain that
the next characters are "

Yes, I think a hybridized AGI and compression algorithm could do
better than either one on its own  However, this might result in
an incredibly slow compression process, depending on how fast the AGI
thinks.  (It would take ME a long time to carry out this process over
the whole Hutter corpus...)

Also, not all narrow-AI compression algorithms will necessarily be
able to produce output in the style you describe above.  Standard LZ
and PPMC etc. type algorithms can, of course; but some other
algorithms might proceed more holistically and not be able to present
humans with options in a human-comprehensible way.

And you still run up against the problem that, for instance, my
daughter Scheherazade is a pretty smart 9 year old but does not have
all that much declarative knowledge about the topics covered in the
Hutter corpus.  She knows what 3x7 equals but she couldn't complete a
fragment of C++ code any better than the LZ algorithm.  The
combination "Scheherazade + compression algorithm that produces
human-interaction-friendly output" might well underperform some other,
slightly more advanced compression algorithm using a
non-human-interaction-friendly algorithm.

So IMO, your tricky workaround is interesting but doesn't really avoid
the problem...


> However, I am uncertain whether
>
> Amazingly outstanding Hutter test result ==> powerful AGI


At least I think you'll agree that an amazingly outstanding Hutter
test result (possibly on an even larger text corpus that included
conversations etc.) would allow you to then construct a machine
that would pass the Turing test?


I agree ONLY in the context of a vastly larger text corpus --- and I
wonder just how large a corpus would be required ... quite possibly,
one much larger than all text ever produced in the history of the
human race...

-- Ben G

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Shane Legg
That seems clear.Human-level AGI =/=> Good Hutter test result

just asHuman =/=> Good Hutter test resultMy suggestion then is to very slightly modify the test as follows:  Instead of just getting the raw characters, what you get is thesequence of characters and the probability distribution over the
next character as predicted by a standard compressor.  You(meaning the algorithm or person being tested) can then chooseto modify this distribution before it is used for compression.So, for example, when the compressor is extremely certain that
the next characters are "do its thing.  But when the string so far is "3x7=" and the compressordoesn't seem to know what the next characters are, you push the
compressor in the right direction.I'm pretty sure that such a combination would easily beat the bestcompressors available when used with a human, or a human levelAGI with world knowledge for that matter.  Indeed I think somebody
has already done something like this before with humans.  Maybeone of the references that Matt gives above.
However, I am uncertain whether
Amazingly outstanding Hutter test result ==> powerful AGIAt least I think you'll agree that an amazingly outstanding Huttertest result (possibly on an even larger text corpus that included
conversations etc.) would allow you to then construct a machinethat would pass the Turing test?Shane

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


[agi] FYI: The Human Speechome Project

2006-08-12 Thread Pei Wang

See the paper at
http://www.cogsci.rpi.edu/CSJarchive/Proceedings/2006/docs/p2059.pdf

ABSTRACT:

The Human Speechome Project is an effort to observe
and computationally model the longitudinal course of
language development of a single child at an unprecedented
scale. The idea is this: Instrument a child's
home so that nearly everything the child hears and sees
from birth to three is recorded. Develop a computational
model of language learning that takes the child's
audio-visual experiential record as input. Evaluate the
model's performance in matching the child's linguistic
abilities as a means of assessing possible learning strategies
used by children in natural contexts. First steps
of a pilot effort along these lines are described including
issues of privacy management and methods for overcoming
limitations of fully-automated machine perception.

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

Howdy Shane,

I'll try to put my views in your format

I think that

Extremely powerful, vastly superhuman AGI ==> outstanding Hutter test result

whereas

Human-level AGI =/=> Good Hutter test result

just as

Human =/=> Good Hutter test result

and for this reason I consider the Hutter test a deeply flawed
benchmark test to apply to systems on the path of gradually improving,
vaguely humanlike general intelligence.

Next, I think that appropriately-constructed narrow-AI systems will be
able to outperform nearly all human-level AGI systems on the Hutter
test.

I.e., in your format, this means I feel that

Good (e.g. winning) Hutter test result =/=> powerful AGI

However, I am uncertain whether

Amazingly outstanding Hutter test result ==> powerful AGI

Is that clear?

-- Ben




On 8/12/06, Shane Legg <[EMAIL PROTECTED]> wrote:


Ben,

So you think that,

Powerful AGI  ==> good Hutter test result

But you have a problem with the reverse implication,

good Hutter test result =/=> Powerful AGI

Is this correct?

Shane

 
 To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to
http://v2.listbox.com/member/[EMAIL PROTECTED]


---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Matt Mahoney
A common objection to compression as a test for AI is that humans can't do 
compression, so it has nothing to do with AI.  The reason people can't compress 
is that compression requires both AI and deterministic computation.  The human 
brain is not deterministic because it is made of neurons, which are noisy 
analog devices.

In order to compress text well, the compressor must be able to estimate 
probabilities over text strings, i.e. predict text.  If you take a text 
fragment from a book or article and ask someone to guess the next character, 
most people could do so more accurately than any program now in existence.  
This is clearly an AI problem.  It takes intelligence to predict text like 
"6+8=__" or "roses are ___".  If a data compressor had such knowledge, then it 
could assign the shortest codes to the most likely answers.  Specifically, if 
the next symbol has probability p, it is assigned a code of length log2(1/p) 
bits.

Such knowledge is useless for human compression because the decompressor must 
have the exact same knowledge to generate the same codes.  This requires 
deterministic computation, which is no problem for a machine.  Therefore I 
believe that compression is a valid test for AI, and desirable because it is 
totally objective, unlike the Loebner prize.

Some known problems with the test:

- To pass the Turing test, a machine must have a model of interactive 
conversation.  The Wikipedia training data lacks examples of dialogs.  My 
argument is that noninteractive text is appropriate for tasks such as OCR, 
language translation, broadcast speech recognition, which is more useful than a 
machine that deliberatly makes mistakes and slows its responses to appear 
human.  I think that the problems of learning interactive and noninteractive 
models have a lot of overlap.

- A language model is insufficient for problems with nontextual I/O such as 
vision or robotics that require symbols to be grounded.  True, but a language 
model should be a part of such systems.

- We do not know how much compression is equivalent to AI.  Shannon gave a very 
rough estimate of 0.6 to 1.3 bits per character entropy for written English in 
1950.  There has not been much progress since then in getting better numbers.  
The best compressors are near the high end of this range.  (I did some research 
in 2000 to try to pin down a better number.  
http://cs.fit.edu/~mmahoney/dissertation/entropy1.html )

- It has not been shown that AI can learn from just a lot of text.  I believe 
that lexical, semantic, and syntactic models can be learned from unlabeled 
text.  Children seem to do this.  I doubt that higher level problem solving 
abilities can be learned without some coaching from the programmer, but this is 
allowed.

- The WIkipedia text has a lot of nontext like hypertext links, tables, foreign 
words, XML, etc.  True, but it is 75% text, so a better compressor still needs 
to compress text better.

Most of these issues were brought up in other newsgroups.
http://groups.google.com/group/comp.ai.nat-lang/browse_frm/thread/9411183ccde5f7a1/#
http://groups.google.com/group/comp.compression/browse_frm/thread/3f096aea993273cb/#
http://groups.google.com/group/Hutter-Prize?lnk=li

I also discuss them here.
http://cs.fit.edu/~mmahoney/compression/rationale.html
 
-- Matt Mahoney, [EMAIL PROTECTED]

- Original Message 
From: Ben Goertzel <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Cc: Bruce J. Klein <[EMAIL PROTECTED]>
Sent: Saturday, August 12, 2006 12:28:30 PM
Subject: [agi] Marcus Hutter's lossless compression of human knowledge prize

Hi,

About the "Hutter Prize" (see the end of this email for a quote of the
post I'm responding to, which was posted a week or two ago)...

While I have the utmost respect for Marcus Hutter's theoretical work
on AGI, and I do think this prize is an interesting one, I also want
to state that I don't think questing to win the Hutter Prize is a very
effective path to follow if one's goal is to create pragmatic AGI.

Look at it this way: WinZip could compress that dataset further than
*I* could, given a brief period of time to do it.  Yet, who is more
generally intelligent, me or WinZip??  And who understands the data
better?

Or, consider my 9 year old daughter, who is a pretty bright girl but
does not yet know how to write computer programs (she seems to have
picked up some recessive genes for social well-adjustedness, and is
not as geeky has her dad or big brothers...).  Without a bunch of
further education, she might NEVER be able to compress that dataset
further than WinZip (whereas I could do it by writing a better
compression algorithm than WinZip, given a bit of time).  Yet I submit
that she has significantly greater general intelligence than WinZip.

In short: Compression as a measure of intelligence is only valid if
you leave both processing time, and the contextuality of intelligence,
out of the picture

Similarly, my Novamente AI system is not made to perf

Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Shane Legg
Ben,So you think that, Powerful AGI  ==> good Hutter test resultBut you have a problem with the reverse implication,good Hutter test result =/=> Powerful AGIIs this correct?
Shane

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] confirmation paradox

2006-08-12 Thread Charles D Hixson

Yan King Yin wrote:



On 8/11/06, Charles D Hixson <[EMAIL PROTECTED] 
> wrote:

> While what you say may be true, the typical sentence is of the form
> "...
> Sentences of the form "All x are y" are quite rare.  Ditto for sentences
> of the form "Some x are y".
 
(?)  "some x are y" is quite common.
 
YKY


In the response that you made, the only occurrences of  the form "Some x 
are y" were quotations. 
I will go further:
In normal language the use of ANY predicate logic is unusual.  I'll 
grant that many of the constructs can be turned into a predicate logic 
form with sufficient twisting and turning...but it doesn't feel 
natural.  It doesn't feel like a "this is how I was really understanding 
it".
When I check on how I was really understanding things I frequently 
detect signs of a visual model with labeled nodes containing 
commentary.  This commentary rather than usually being descriptive is 
prescriptive.  It tells things that should be done rather then describe 
the model.  Now the prescriptive comments tend to be of the form:  If 
you want to achieve this result, do that.  This form looks like 
predicate logic, but it isn't USED as predicate logic.  It's a very 
local rule.  The next step is to model how it would feel if I were to 
make the state transition indicated by the model.  This appears to be 
largely kinesthetic feeling, but it's also specialized.  A certain 
tension in the shoulder means "think very carefully before you do that", 
however the words are something that I have added after analysis, they 
aren't a part of the process.  The tension actually means, if 
interpreted directly "lift your head up and scan for danger" (and again, 
this is done as muscle tensions, not as words).


OTOH, I'm not certain that everyone uses the same coding systems.   I 
feel that I have reasonable observational grounds to doubt it.  It's 
hard to get deeply enough into someone else's head to know how they are 
thinking in degree that they don't, themselves, notice, but I feel that 
I've notices sufficient to strongly indicate that some people think 
differently than I do.  E.g., my wife has no trouble verbalizing her 
feelings, while for me it is always a matter calling for careful 
thought.  I *feel* them, but verbalizing them is quite difficult.


This is not to say that propositional calculus can't be extended to 
handle this...but I definitely feel that you will need a much stronger 
than minimal computer to handle intelligence with that approach.


---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] confirmation paradox

2006-08-12 Thread Ben Goertzel

Hi YKY,

You asked:


How can you express "John kicks Mary" in term logic?



One obvious way is: By an Inheritance relation between the ordered
pair (John, Mary) and the term "kick"

If you want to explore term logic, besides Pei's dissertation and
papers, you should look up the book of Sommers and Englebretsen

http://www.bookkoob.co.uk/book/0754613666.htm

[also on Amazon.com, etc.]

IMO, asking basic questions on an email list is not the optimal way to
gain grounding in a moderately deep area of mathematical logic ;-)

-- Ben

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] confirmation paradox

2006-08-12 Thread Pei Wang

On 8/12/06, Yan King Yin <[EMAIL PROTECTED]> wrote:


How can you express "John kicks Mary" in term logic?


(*, {John}, {Mary}) --> kick

See http://www.cogsci.indiana.edu/farg/peiwang/PUBLICATION/wang.unifiedAI.pdf
and http://www.cogsci.indiana.edu/farg/peiwang/NARS/

Pei

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] confirmation paradox

2006-08-12 Thread Yan King Yin

On 8/10/06, Pei Wang <[EMAIL PROTECTED]> wrote: 
> I don't think predicate logic plus probability is the way to go, but> won't try to convince you by email. I've said more in my writings.
True, "standard" predicate logic does not have inferences using induction, abduction, and analogy.  But these can be added to the framework of probabilistic predicate logic.

 
Your argument seems to be that the confirmation paradox makes predicate logic inappropriate, and so you reverted to using term logic.  Actually, predicate logic can handle the term-logic kind of stuff by using appropriate quantifiers such as "there exists some".

> Of course NARS can express much more than just "P is Q". I mentioned that in> 
http://www.cogsci.indiana.edu/farg/peiwang/PUBLICATION/wang.cognitive_mathematical.pdf> and gave more details in my other publications, as well as the demo> examples. 
How can you express "John kicks Mary" in term logic?
 
YKY

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] confirmation paradox

2006-08-12 Thread Yan King Yin

On 8/11/06, Charles D Hixson <[EMAIL PROTECTED]> wrote: 
> While what you say may be true, the typical sentence is of the form> "Ravens are black".  I feel that this should be interpreted as> "Typically, ravens are black".  Further information as to how reliably
> ravens are black isn't contained in the sentence, and must be derived> exogenously.  All that tells you is that you should expect a random> raven chosen without regard to color to be black.  The probability,
> without further information would be somewhere between 50.1% and> 100%, presuming that you are rating your source as 100% reliable.  So> there are, at minimum, two figures of merit:  1) the proportion of
> ravens which are black, and 2) how much do you trust the accuracy of the> information provided by this source?   This is ignoring things like> sample selection bias of all sorts, including "local data".  If I see a
> swan, it will probably be white.  There are very few black swans in> California.  If I lived in Australia the answer would be different.> Then you would have globally there are more white swans, but locally
> there are more black swans.  So even with a totally reliable source I> would need to guess context.
"Ravens are black" is implicitly understood to be universally quantified.  In everyday language, it could mean "typically ravens are black", as you suggested.  However, p( "typically ravens are black" ) should be 1, because that statement is true.  It is only 
within that statement that some exceptional ravens are nonblack.
 
All I'm saying is that, the kind of uncertainty we usually care about, in the sentence "A's are B's", occurs inside the sentence.  Attaching probabilities to sentences does not help in such case.

 
> Sentences of the form "All x are y" are quite rare.  Ditto for sentences> of the form "Some x are y". 
(?)  "some x are y" is quite common.
 
YKY
 
 

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


[agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-12 Thread Ben Goertzel

Hi,

About the "Hutter Prize" (see the end of this email for a quote of the
post I'm responding to, which was posted a week or two ago)...

While I have the utmost respect for Marcus Hutter's theoretical work
on AGI, and I do think this prize is an interesting one, I also want
to state that I don't think questing to win the Hutter Prize is a very
effective path to follow if one's goal is to create pragmatic AGI.

Look at it this way: WinZip could compress that dataset further than
*I* could, given a brief period of time to do it.  Yet, who is more
generally intelligent, me or WinZip??  And who understands the data
better?

Or, consider my 9 year old daughter, who is a pretty bright girl but
does not yet know how to write computer programs (she seems to have
picked up some recessive genes for social well-adjustedness, and is
not as geeky has her dad or big brothers...).  Without a bunch of
further education, she might NEVER be able to compress that dataset
further than WinZip (whereas I could do it by writing a better
compression algorithm than WinZip, given a bit of time).  Yet I submit
that she has significantly greater general intelligence than WinZip.

In short: Compression as a measure of intelligence is only valid if
you leave both processing time, and the contextuality of intelligence,
out of the picture

Similarly, my Novamente AI system is not made to perform rapid
compression of data, it's made to try to understand data  If we
create a Novababy that can act like a human toddler in its 3D
simulation world, this baby will be no better able to compress a
massive dataset than my daughter is.  Yet, I would rate it more
intelligent than WinZip (or even gzip or bzip, whatever) any day.

Potentially, the *best possible* compression achievable using feasible
computational resources would be obtained by something coupling real
understanding (as exists in the human brain) with the kind of
compression tricks that exist in WinZip and so forth.  But creating
this kind of coupled system in software probably requires creating an
AGI (able to rapidly handle real-world situations) and *then* coupling
it with something WinZip-like (able to rapidly handle massive data
compression situations).

So, as with the Loebner Prize, the Hutter Prize has the problem that
the best way to achieve incremental success (become the best in the
world at carrying out the prize task) is NOT really along any sensible
path to achieving dramatic success.

Even though I think a fully advanced Novamente system would be able to
kick the ass of humans or WinZip type programs at the Hutter task, I
don't think this is a good metric by which to assess the progressing
intelligence of early-stage Novamente versions.

Of course, I could be wrong, and maybe AGI will be achieved by making
better and better WinZips -- because eventually to achieve superior
compression, AGI will need to be sneaked into the WinZip code.  But, I
really doubt it.  If this is how it works out, I'll eat my hat (after
compressing it ;-).

I do agree with the perspective presented by Solomonoff, Hutter, Eric
Baum and others that compression is an important way to think about
intelligence.  Indeed, I have waxed long and lyrical about the
centrality of "pattern" to intelligence, and I have defined a pattern
as "a representation as something simpler" -- i.e. as a compression.
But just because compression is fundamental to the mind's operation,
doesn't mean that compressing some computer file is a good test of
intelligence or a good way to define milestones along the path to AGI.
Rather, a feasible-computational-resources AGI carries out a whole
bunch of little context-appropriate compressions regarding the
situations, actions and perceptions that it sees in the real world
when trying to achieve its goals.  There are real mathematical and
conceptual parallels between this process and the process of file
compression, but there are significant conceptual and pragmatic
differences as well...

-- Ben G



Introducing the Hutter Prize for Lossless Compression of Human
Knowledge

Artificial intelligence researchers finally have an objective and
rigorously validated measure of the intelligence of their machines.
Furthermore the higher the measured intelligence of their machines the
more money they can win via the Hutter Prize.

The purse for the Hutter Prize was initially underwritten with a 50,000
Euro commitment to the prize fund by Marcus Hutter of the Swiss Dalle
Molle Institute for Artificial Intelligence, affiliated with the
University of Lugano and The University of Applied Sciences of Southern
Switzerland.

The theoretic basis of the Hutter Prize is related to an insight by the
12th century philosopher, William of Ockham, called "Ockham's Razor",
sometimes quoted as: "It is vain to do with more what can be done with
less." But it was not till the year 2000 that this was mathematically
proven*, by Marcus Hutter, to be a founding principle of intelligence.
Indeed, Hutter'