Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-16 Thread Matt Mahoney
If dumb models kill smart ones in text compression, then how do you know they are dumb? What is your objective test of "smart"? The fact is that in speech recognition research, language models with a lower perplexity also have lower word error rates.We have "smart" statistical parsers that are 60% accurate when trained and tested on manually labeled text. So why haven't we solved the AI problem? Meanwhile, a "dumb" model like matching query words to document words enables Google to answer natural language queries, while our smart parsers choke when you misspell a word. Who is smart and who is dumb? -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, August 16, 2006 9:17:52 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

 


 You 
group the strings into a fixed set and a variable set and concatenate 
them. The variable set could be just "I only used red and yellow paint in 
the painting", and you compare the CDM replacing "yellow" with "white".  
Of course your compressor must be capable of abstract reasoning and have a world 
model.

Very nice 
example of 
"homonculous"/"turtles-all-the-way-down"reasoning.

 The 
problem is that many people do not believe that text compression is related to 
AI (even though speech recognition researchers have been evaluating models by 
perplexity since the early 1990's).
I believe that 
it's related to AI . . . . but that the dumbest models kill intelligent models 
every time . . . .which then makes AI useless for text 
compression

And bit-level text storage and reproduction is unnecessary for 
AI (and adds a lot of needless complexity) . . . . 

So why are combining the two?


  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 6:02 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  Mark 
  wrote:
  Huh? By definition, the compressor with the 
  best language model is the one with the highest compression 
  ratio.I'm glad we 
  finally agree :-)
   You 
  could use Keogh's compression dissimilarity measure to test for 
  inconsistency.
  I don't think so. Take the following 
  strings: "I only used red and yellow paint in the painting", "I painted the 
  rose in my favorite color", "My favorite color is pink", "Orange is created by 
  mixing red and yellow", "Pink is created by mixing red and white". How 
  is Keogh's measure going to help you with that?You group the 
  strings into a fixed set and a variable set and concatenate them. The 
  variable set could be just "I only used red and yellow paint in the painting", 
  and you compare the CDM replacing "yellow" with "white".  Of course your 
  compressor must be capable of abstract reasoning and have a world 
  model.To answer Phil's post: Text compression is only 
  near the theoretical limts for small files. For large files, there is 
  progress to be made integrating known syntactic and semantic modeling 
  techniques into general purpose compressors. The theoretical limit is 
  about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The 
  proof that I gave that a language model implies passing the Turing test is for 
  the ideal case where all people share identical models. The ideal case 
  is deterministic. For the real case where models differ, passing the 
  test is easier because a judge will attribute some machine errors to normal 
  human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text 
  compression is equivalent to AI).It is really hard to get 
  funding for text compression research (or AI). I had to change my 
  dissertation topic to network security in 1999 because my advisor had funding 
  for that. As a postdoc I applied for a $50K NSF grant for a text 
  compression contest. It was rejected, so I started one without funding 
  (which we now have). The problem is that many people do not believe that 
  text compression is related to AI (even though speech recognition researchers 
  have been evaluating models by perplexity since the early 1990's).
  -- Matt Mahoney, [EMAIL PROTECTED]
  
  ----- 
  Original Message From: Mark Waser [EMAIL PROTECTED]To: 
  agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject: 
  Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human 
  knowledge prize
  

   You 
  could use Keogh's compression dissimilarity measure to test for 
  inconsistency.
  I don't think so. Take the following 
  strings: "I only used red and yellow paint in the painting", "I painted the 
  rose in my favorite color", "My fav

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-16 Thread Mark Waser



 If 
dumb models kill smart ones in text compression, then how do you know they are 
dumb? 

They are dumb 
because they are inflexible and always use the same very simple rules. 
Fortunately, those "dumb" rules are good enough.

 What 
is your objective test of "smart"? 

Mydefinition of smart is that it is 
1)flexible, 2) shows a wide variety of behaviors, 3) uses many different 
rules based upon circumstances to optimize results, and, most importantly, 4) is 
successful.

My objective test is that if it fails one or more 
of the above criteria, it is dumb.

 The 
fact is that in speech recognition research, language models with a lower 
perplexity also have lower word error rates.

Yes, and what does that have to do with the price 
of tea in China? You're confusing yourself with irrelevant facts. 
Reflexes are dumb but the person with good reflexes is far more likely to 
avoid/survive a car crash than someone who has to think when things go 
wrong. However, if two people both don't have reflexes, the one with the 
faster thought process is more likely to avoid/survive a car crash. I'd 
still go with the reflexes everytime though.

 We have "smart" statistical parsers that are 60% accurate when 
trained and tested on manually labeled text. So why haven't we solved the 
AI problem? 

Because 60% accuracy sucks. Because what you 
mean by "accurate" has nothing to do with AI. Because 60% accuracy is NOT 
successful and I would call your "smart" statistical parser dumb. I would 
also call it dumb because statistical parsers are also generally one-trick 
ponies.

 Who is 
smart and who is dumb? 
Your "smart" parser is dumb because it doesn't 
work. The Google method is dumb because it is inflexible and always uses 
the same simple rules. Dumb often works and "really smart" is smart enough 
to use dumb when it works.



  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Wednesday, August 16, 2006 2:05 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  If 
  dumb models kill smart ones in text compression, then how do you know they are 
  dumb? What is your objective test of "smart"? The fact is that in 
  speech recognition research, language models with a lower perplexity also have 
  lower word error rates.We have "smart" statistical parsers that are 
  60% accurate when trained and tested on manually labeled text. So why 
  haven't we solved the AI problem? Meanwhile, a "dumb" model like 
  matching query words to document words enables Google to answer natural 
  language queries, while our smart parsers choke when you misspell a 
  word. Who is smart and who is dumb? -- Matt Mahoney, 
  [EMAIL PROTECTED]
  
  - Original Message From: Mark Waser 
  [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Wednesday, 
  August 16, 2006 9:17:52 AMSubject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  

   You 
  group the strings into a fixed set and a variable set and concatenate 
  them. The variable set could be just "I only used red and yellow paint 
  in the painting", and you compare the CDM replacing "yellow" with "white". 
   Of course your compressor must be capable of abstract reasoning and 
  have a world model.
  
  Very nice 
  example of 
  "homonculous"/"turtles-all-the-way-down"reasoning.
  
   The 
  problem is that many people do not believe that text compression is related to 
  AI (even though speech recognition researchers have been evaluating models by 
  perplexity since the early 1990's).
  I believe 
  that it's related to AI . . . . but that the dumbest models kill intelligent 
  models every time . . . .which then makes AI useless for text 
  compression
  
  And bit-level text storage and reproduction is unnecessary 
  for AI (and adds a lot of needless complexity) . . . . 
  
  
  So why are combining the two?
  
  
- 
Original Message ----- 
    From: 
    Matt 
    Mahoney 
To: 
    agi@v2.listbox.com 
    Sent: 
Tuesday, August 15, 2006 6:02 PM
Subject: 
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human 
knowledge prize

Mark 
wrote:
Huh? By definition, the compressor with the 
best language model is the one with the highest compression 
ratio.I'm glad we 
finally agree :-)
 You could use Keogh's compression dissimilarity measure to test for 
inconsistency.
I don't think so. Take the following 
strings: "I only used red and yellow paint in the painting", "I painted the 
rose in my favorite color", "My favorite color is pink", "Orange is created 
by mixing red and yellow", "Pink is created by mixing red and white". 
How i

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser



 I 
don't see any point in this debate over lossless vs. lossy 
compression

Lets see if I can simplify it.

  The stated goal is compressing human 
  knowledge.
  The exact, same knowledge can always be expressed 
  in a *VERY*large number of different bit strings
  Not being able to reproduce the exact bit string 
  is lossy compression when viewed from the bitviewpoint but can be 
  lossless from the knowledge viewpoint
  Therefore, reproducing the bit string isan 
  additional requirement above and beyond the stated goal
  I strongly believe that this additional 
  requirement will necessitate a *VERY* large amount of additional work not 
  necessary for the stated goal
  In addition, by information theory, reproducing 
  the exact bit string will requireadditional information beyond the 
  knowledge contained in it (since numerous different strings can encode the 
  same knowledge)
  Assuming optimalcompression, also by by 
  information theory, additional information will add to the compressed size 
  (i.e. lead to a less optimal result).
So the question is "Given thatbit-level 
reproduction is harder, not necessary for knowledge compression/intelligence, 
and doesn't allow for the same degree of compression. Why makelife 
tougher when it isn't necessary for your stated purposes and makes your results 
(i.e. compression) worse?"


  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 12:55 
  AM
  Subject: Re: Sampo: [agi] Marcus Hutter's 
  lossless compression of human knowledge prize
  
  Where 
  will the knowledge to compress text come from? There are 3 
  possibilities.1. externally supplied, like the lexical models 
  (dictionaries) for paq8h and WinRK.2. learned from the input in a separate 
  pass, like xml-wrt|ppmonstr.3. learned online in one pass, like paq8f and 
  slim.These all have the same effect on compressed size. In the 
  first case, you increase the size of the decompressor. In the second, 
  you have to append the model you learned from the first pass to the compressed 
  file so it is available to the decompressor. In the third case, 
  compression is poor at the beginning. From the viewpoint of information 
  theory, there is no difference in these three approaches. The penalty is 
  the same.To improve compression further, you will need to model 
  semantics and/or syntax. No compressor currently does this. I 
  think the reason is that it is not worthwhile unless you have hundreds of 
  megabytes of natural language text. In fact, only the top few 
  compressors even have lexical models. All the rest are byte oriented 
  n-gram models.A semantic model would know what words are related, like 
  "star" and "moon". It would learn this by their tendency to appear 
  together. You can build a dictionary of such knowledge from the data set 
  itself or you can build it some other way (such as Wordnet) and include it in 
  the decompressor. If you learn it from the input, you could do it in a 
  separate pass (like LSA) or you could do it in one pass (maybe an equivalent 
  neural network) so that you build the model as you compress.To learn 
  syntax, you can cluster words by similarity of their immediate context. 
  These clusters correspond to part of speech. For instance, "the X is" 
  tells you that X is a noun. You can model simple grammars as n-grams 
  over their classifications, such as (Art Noun Verb). Again, you can use 
  any of 3 approaches.Learning semantics and syntax is a hard problem, 
  but I think you can see it can be done with statistical modeling. The 
  training data you need is in the input itself.I don't see any point in 
  this debate over lossless vs. lossy compression. You have to solve the 
  language learning problem in either case to improve compression. I think 
  it will be more productive to discuss how this can be done.
  -- Matt Mahoney, [EMAIL PROTECTED]
  
  
  To unsubscribe, change your address, or temporarily deactivate your 
  subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED] 

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
I realize it is tempting to use lossy text compression as a test for AI because that is what the human brain does when we read text and recall it in paraphrased fashion. We remember the ideas and discard details about the _expression_ of those ideas. A lossy text compressor that did the same thing would certainly demonstrate AI.But there are two problems with using lossy compression as a test of AI:1. The test is subjective.2. Lossy compression does not imply AI.Lets assume we solve the subjectivity problem by having human judges evaluate whether the decompressed output is "close enough" to the input. We already do this with lossy image, audio and video compression (without much consensus).The second problem remains: ideal lossy compression does not imply passing
 the Turing test. For lossless compression, it can be proven that it does. Let p(s) be the (unknown) probability that s will be the prefix of a text dialog. Then a machine that can compute p(s) exactly is able to generate response A to question Q with the distribution p(QA)/p(Q) which is indistinguishable from human. The same model minimizes the compressed size, E[log 1/p(s)].This proof does not hold for lossy compression because different lossless models map to identical lossy models. The desired property of a lossless compressor C is that if and only if s1 and s2 have the same meaning (to most people), then the encodings C(s1) = C(s2). This code will ideally have length log 1/(p(s1)+p(s2)). But this does not imply that the decompressor knows p(s1) or p(s2). Thus, the decompressor may decompress to s1 or s2 or choose randomly between them. In general, the output distribution will be different than the true
 distrubution p(s1), p(s2), so it will be distinguishable from human even if the compression ratio is ideal.-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 9:28:26 AMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

 


 I 
don't see any point in this debate over lossless vs. lossy 
compression

Lets see if I can simplify it.

  The stated goal is compressing human 
  knowledge.
  The exact, same knowledge can always be expressed 
  in a *VERY*large number of different bit strings
  Not being able to reproduce the exact bit string 
  is lossy compression when viewed from the bitviewpoint but can be 
  lossless from the knowledge viewpoint
  Therefore, reproducing the bit string isan 
  additional requirement above and beyond the stated goal
  I strongly believe that this additional 
  requirement will necessitate a *VERY* large amount of additional work not 
  necessary for the stated goal
  In addition, by information theory, reproducing 
  the exact bit string will requireadditional information beyond the 
  knowledge contained in it (since numerous different strings can encode the 
  same knowledge)
  Assuming optimalcompression, also by by 
  information theory, additional information will add to the compressed size 
  (i.e. lead to a less optimal result).
So the question is "Given thatbit-level 
reproduction is harder, not necessary for knowledge compression/intelligence, 
and doesn't allow for the same degree of compression. Why makelife 
tougher when it isn't necessary for your stated purposes and makes your results 
(i.e. compression) worse?"


  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 12:55 
  AM
  Subject: Re: Sampo: [agi] Marcus Hutter's 
  lossless compression of human knowledge prize
  
  Where 
  will the knowledge to compress text come from? There are 3 
  possibilities.1. externally supplied, like the lexical models 
  (dictionaries) for paq8h and WinRK.2. learned from the input in a separate 
  pass, like xml-wrt|ppmonstr.3. learned online in one pass, like paq8f and 
  slim.These all have the same effect on compressed size. In the 
  first case, you increase the size of the decompressor. In the second, 
  you have to append the model you learned from the first pass to the compressed 
  file so it is available to the decompressor. In the third case, 
  compression is poor at the beginning. From the viewpoint of information 
  theory, there is no difference in these three approaches. The penalty is 
  the same.To improve compression further, you will need to model 
  semantics and/or syntax. No compressor currently does this. I 
  think the reason is that it is not worthwhile unless you have hundreds of 
  megabytes of natural language text. In fact, only the top few 
  compressors even have lexical models. All the rest are byte oriented 
  n-gram models.A semantic model would know what words are related, like 
  "star" and "moon". It would learn this by their tendency to appear 
  together. You can build a dictionary of such knowledge from the data set 
  i

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser



 1. The 
test is subjective.
I disagree. If you have an automated test 
with clear criteria like the following, it will be completely 
objective:

a)the compressing program must be able to output all 
inconsistencies in the corpus (in their original string form)AND
b)the decompressing program must be able to do the 
following when presented with a standard list of test ideas/pieces of 
knowledge

  FOR EACH IDEA/PIECE OF KNOWLEDGE IN 
THE TEST WHICH IS NOT IN THE LIST OF INCONSISTENCIES

  if the knowledge is in the corpus, recognizethatitis in 
  the corpus. 
  if the negation of the knowledge is in the corpus, recognize that 
  thetest knowledgeis false according to the corpus. 
  ifan incorrectsubstitution has been made tocreate 
  thetest itemfrom an itemthe corpus(i.e. red for 
  yellow,ten for twenty, etc.), recognize that thetest 
  knowledgeis false according to the corpus. 
  ifa possibly correct (hierarchical) substitution has been made 
  tocreate the testitem in the corpus, recognizethat the 
  substitutionis either a) in the corpus for broader concepts 
  (i.e.testing red for corpus lavender,testing dozens for corpus 
  thirty-seven, etc)or b)thatthere is related information in 
  the corpus which the test idea further refinesfor narrower 
  substitutions
 2. Lossy compression does not imply AI.
and two sentences before
 A lossy text compressor that did the same thing (recall it in 
paraphrased fashion)would certainly demonstrate AI.
Require thatthe decompressing programbe able tooutput all 
of the compressed file's knowledge in ordinary English. This is a pretty 
trivial task compared to everything else.

 Mark

- Original Message - 

  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 12:27 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  
  I realize it is tempting to use lossy text compression as a test for AI 
  because that is what the human brain does when we read text and recall it in 
  paraphrased fashion. We remember the ideas and discard details about the 
  _expression_ of those ideas. A lossy text compressor that did the same 
  thing would certainly demonstrate AI.But there are two problems with 
  using lossy compression as a test of AI:1. The test is subjective.2. 
  Lossy compression does not imply AI.Lets assume we solve the 
  subjectivity problem by having human judges evaluate whether the decompressed 
  output is "close enough" to the input. We already do this with lossy 
  image, audio and video compression (without much consensus).The second 
  problem remains: ideal lossy compression does not imply passing the Turing 
  test. For lossless compression, it can be proven that it does. Let 
  p(s) be the (unknown) probability that s will be the prefix of a text 
  dialog. Then a machine that can compute p(s) exactly is able to generate 
  response A to question Q with the distribution p(QA)/p(Q) which is 
  indistinguishable from human. The same model minimizes the compressed 
  size, E[log 1/p(s)].This proof does not hold for lossy compression 
  because different lossless models map to identical lossy models. The 
  desired property of a lossless compressor C is that if and only if s1 and s2 
  have the same meaning (to most people), then the encodings C(s1) = 
  C(s2). This code will ideally have length log 1/(p(s1)+p(s2)). But 
  this does not imply that the decompressor knows p(s1) or p(s2). Thus, 
  the decompressor may decompress to s1 or s2 or choose randomly between 
  them. In general, the output distribution will be different than the 
  true distrubution p(s1), p(s2), so it will be distinguishable from human even 
  if the compression ratio is ideal.-- Matt Mahoney, 
  [EMAIL PROTECTED]
  
  - 
  Original Message From: Mark Waser [EMAIL PROTECTED]To: 
  agi@v2.listbox.comSent: Tuesday, August 15, 2006 9:28:26 AMSubject: 
  Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human 
  knowledge prize
  

   I 
  don't see any point in this debate over lossless vs. lossy 
  compression
  
  Lets see if I can simplify it.
  
The stated goal is compressing human 
knowledge. 
The exact, same knowledge can always be 
expressed in a *VERY*large number of different bit strings 
Not being able to reproduce the exact bit string 
is lossy compression when viewed from the bitviewpoint but can be 
lossless from the knowledge viewpoint 
Therefore, reproducing the bit string isan 
additional requirement above and beyond the stated goal 
I strongly believe that this additional 
requirement will necessitate a *VERY* large amount of additional work not 
necessary for the stated goal 
In addition, by information theory, reproducing 
the exact bit string will requireadditional information beyond the 
knowledge contained in it (since numerous different strings can encode the 
same

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
You could use Keogh's compression dissimilarity measure to test for inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means the compressed size of x (lossless). The measure ranges from about 0.5 if x = y to about 1.0 if x and y do not share any information. Then, CDM("it is hot", "it is very warm")  CDM("it is hot", "it is cold").assuming your compressor uses a good language model.Now if only we had some test to tell which compressors have the best language models...-- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 3:22:10 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

 


 Could 
you please write a test program to objectively test for lossy text compression 
using your algorithm?

Writingthe test program for the decompressing 
programis relatively easy. Since the requirement was that the 
decompressing program be able to recognize when a piece of knowledge is in the 
corpus, when it's negation is in the corpus, when an incorrect substitution has 
been made, and when a correct substitution has been made -- all you/I would need 
to do isinvent (orobtain -- see two paragraphs down)a 
reasonably sized set of knowledge pieces to test, put them in a file, feed them 
to the decompressing program, and automatically grade it's answers as to which 
categoryeachfalls into. A reasonably small number of test 
cases should suffice as long as you don't advertise exactly which test cases are 
in the final test but once you're having competitors generate each other's 
tests, you can go hog-wild with the number.

Writing the test program for the compressing 
program is also easy but developing the master list of inconsistencies is going 
to be a real difficulty -- unless you use the various contenders themselves to 
generate various versions of the list. I strongly doubt that most 
contenders will get false positives but strongly suspect that finding all of the 
inconsistencies will be a major area for improvement as the systems become more 
sophisticated.

Note also that minor modifications ofany 
decompressing program should also be able to create test cases for your 
decompressor test. Simply ask it for a random sampling of knowledge, for 
the negations of a random sampling of knowledge, for some incorrect 
substitutions, and some hierarchical substitutions of each type.

Any *real* contenders should be able to easily 
generate the tests for you.

 You 
can start by listing all of the inconsistencies in 
Wikipedia.

see paragraph 2 above

 To 
make the test objective, you will either need a function to test whether two 
strings are inconsistent or not, or else you need to show that people will never 
disagree on this matter.
It is impossible to show that people will never 
disagree on a matter.

On the other hand, a knowledge compressor is going 
to have to recognize when two pieces of knowledge conflict (i.e. when two 
strings parse into knowledge statements that cannot coexist). You can 
always have a contender evaluate whether a competitor's 
"inconsistencies"are incorrect and then do some examination by hand on a 
representative sample where the contender says it can't tell (since, 
again,I suspect you'll find few misidentified inconsistencies -- but that 
finding all of the inconsistencies will be ever subject to 
improvement).

  Lossy compression does not imply 
AI.
  A lossy text compressor that did 
the same thing (recall it in paraphrased fashion)would certainly 
demonstrate AI. I disagree 
that these are inconsistent. Demonstrating and implying are different 
things.

I didn't say that they were inconsistent. What I meant to say was 


  that a decompressing programthat isable tooutput all 
  of the compressed file's knowledge in ordinary English would, in your 
  words,"certainly demonstrate AI".
  given statement 1, it's not a problem that "lossy compression does not 
  imply AI" since the decompressing program would still "certainly demonstrate 
  AI"

- Original Message - 

  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 2:23 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  Mark,Could 
  you please write a test program to objectively test for lossy text compression 
  using your algorithm? You can start by listing all of the 
  inconsistencies in Wikipedia. To make the test objective, you will 
  either need a function to test whether two strings are inconsistent or not, or 
  else you need to show that people will never disagree on this matter.
   Lossy compression does not imply 
  AI. A lossy text compressor that 
  did the same thing (recall it in paraphrased fashion)would cert

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Philip Goetz

On 8/15/06, Matt Mahoney [EMAIL PROTECTED] wrote:



I realize it is tempting to use lossy text compression as a test for AI
because that is what the human brain does when we read text and recall it in
paraphrased fashion.  We remember the ideas and discard details about the
expression of those ideas.  A lossy text compressor that did the same thing
would certainly demonstrate AI.

But there are two problems with using lossy compression as a test of AI:
1. The test is subjective.
2. Lossy compression does not imply AI.

Lets assume we solve the subjectivity problem by having human judges
evaluate whether the decompressed output is close enough to the input.  We
already do this with lossy image, audio and video compression (without much
consensus).

The second problem remains: ideal lossy compression does not imply passing
the Turing test.  For lossless compression, it can be proven that it does.
Let p(s) be the (unknown) probability that s will be the prefix of a text
dialog.  Then a machine that can compute p(s) exactly is able to generate
response A to question Q with the distribution p(QA)/p(Q) which is
indistinguishable from human.  The same model minimizes the compressed size,
E[log 1/p(s)].


This proof is really not useful.  The Turing test is subjective; all
you are saying is that lossy compression is lossy, and lossless
compression is not.  A solution to the first problem would also solve
the second problem.

It is necessary to allow lossy compression in order for this
compression test to be useful for AI, because lossless and
uncomprehending compression is already bumping up against the
theoretical limits for text compression.

- Phil

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Mark Waser



 You 
could use Keogh's compression dissimilarity measure to test for 
inconsistency.
I don't think so. Take the following strings: 
"I only used red and yellow paint in the painting", "I painted the rose in my 
favorite color", "My favorite color is pink", "Orange is created by mixing red 
and yellow", "Pink is created by mixing red and white". How is Keogh's 
measure going to help you with that?

The problem is that Keogh's measure is intended for 
data-mining where you have separate instances, not one big entwined Gordian 
knot.

 Now if 
only we had some test to tell which compressors have the best language 
models...
Huh? By definition, the compressor with the best 
language model is the one with the highest compression ratio.


  - Original Message - 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 3:54 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  You 
  could use Keogh's compression dissimilarity measure to test for 
  inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf 
  CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means 
  the compressed size of x (lossless). The measure ranges from about 0.5 
  if x = y to about 1.0 if x and y do not share any information. 
  Then, CDM("it is hot", "it is very warm")  CDM("it is hot", 
  "it is cold").assuming your compressor uses a good language 
  model.Now if only we had some test to tell which compressors have the 
  best language models...
  -- Matt Mahoney, [EMAIL PROTECTED]
  
  - 
  Original Message From: Mark Waser [EMAIL PROTECTED]To: 
  agi@v2.listbox.comSent: Tuesday, August 15, 2006 3:22:10 PMSubject: 
  Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human 
  knowledge prize
  

   Could you please write a test program to objectively test for lossy 
  text compression using your algorithm?
  
  Writingthe test program for the 
  decompressing programis relatively easy. Since the requirement was 
  that the decompressing program be able to recognize when a piece of knowledge 
  is in the corpus, when it's negation is in the corpus, when an incorrect 
  substitution has been made, and when a correct substitution has been made -- 
  all you/I would need to do isinvent (orobtain -- see two 
  paragraphs down)a reasonably sized set of knowledge pieces to test, put 
  them in a file, feed them to the decompressing program, and automatically 
  grade it's answers as to which categoryeachfalls into. A 
  reasonably small number of test cases should suffice as long as you don't 
  advertise exactly which test cases are in the final test but once you're 
  having competitors generate each other's tests, you can go hog-wild with the 
  number.
  
  Writing the test program for the compressing 
  program is also easy but developing the master list of inconsistencies is 
  going to be a real difficulty -- unless you use the various contenders 
  themselves to generate various versions of the list. I strongly doubt 
  that most contenders will get false positives but strongly suspect that 
  finding all of the inconsistencies will be a major area for improvement as the 
  systems become more sophisticated.
  
  Note also that minor modifications ofany 
  decompressing program should also be able to create test cases for your 
  decompressor test. Simply ask it for a random sampling of knowledge, for 
  the negations of a random sampling of knowledge, for some incorrect 
  substitutions, and some hierarchical substitutions of each type.
  
  Any *real* contenders should be able to easily 
  generate the tests for you.
  
   You 
  can start by listing all of the inconsistencies in 
  Wikipedia.
  
  see paragraph 2 above
  
   To 
  make the test objective, you will either need a function to test whether two 
  strings are inconsistent or not, or else you need to show that people will 
  never disagree on this matter.
  It is impossible to show that people will never 
  disagree on a matter.
  
  On the other hand, a knowledge compressor is 
  going to have to recognize when two pieces of knowledge conflict (i.e. when 
  two strings parse into knowledge statements that cannot coexist). You 
  can always have a contender evaluate whether a competitor's 
  "inconsistencies"are incorrect and then do some examination by hand on a 
  representative sample where the contender says it can't tell (since, 
  again,I suspect you'll find few misidentified inconsistencies -- but 
  that finding all of the inconsistencies will be ever subject to 
  improvement).
  
Lossy 
  compression does not imply AI.
A lossy 
  text compressor that did the same thing (recall it in paraphrased 
  fashion)would certainly demonstrate AI. I disagree 
  that these are inconsistent. Demonstrating and 

Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

2006-08-15 Thread Matt Mahoney
Mark wrote:Huh? By definition, the compressor with the best 
language model is the one with the highest compression ratio.
I'm glad we finally agree :-) You 
could use Keogh's compression dissimilarity measure to test for 
inconsistency.
I don't think so. Take the following strings: 
"I only used red and yellow paint in the painting", "I painted the rose in my 
favorite color", "My favorite color is pink", "Orange is created by mixing red 
and yellow", "Pink is created by mixing red and white". How is Keogh's 
measure going to help you with that?
You group the strings into a fixed set and a variable set and concatenate them. The variable set could be just "I only used red and yellow paint in the painting", and you compare the CDM replacing "yellow" with "white".  Of course your compressor must be capable of abstract reasoning and have a world model.To answer Phil's post: Text compression is only near the theoretical limts for small files. For large files, there is progress to be made integrating known syntactic and semantic modeling techniques into general purpose compressors. The theoretical limit is about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The proof that I gave that a language model implies passing the Turing test is for the ideal case where all people share identical models. The ideal case is
 deterministic. For the real case where models differ, passing the test is easier because a judge will attribute some machine errors to normal human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text compression is equivalent to AI).It is really hard to get funding for text compression research (or AI). I had to change my dissertation topic to network security in 1999 because my advisor had funding for that. As a postdoc I applied for a $50K NSF grant for a text compression contest. It was rejected, so I started one without funding (which we now have). The problem is that many people do not believe that text compression is related to AI (even though speech recognition researchers have been evaluating models by perplexity since the early 1990's).-- Matt Mahoney,
 [EMAIL PROTECTED]- Original Message From: Mark Waser [EMAIL PROTECTED]To: agi@v2.listbox.comSent: Tuesday, August 15, 2006 5:00:47 PMSubject: Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human knowledge prize

 


 You 
could use Keogh's compression dissimilarity measure to test for 
inconsistency.
I don't think so. Take the following strings: 
"I only used red and yellow paint in the painting", "I painted the rose in my 
favorite color", "My favorite color is pink", "Orange is created by mixing red 
and yellow", "Pink is created by mixing red and white". How is Keogh's 
measure going to help you with that?

The problem is that Keogh's measure is intended for 
data-mining where you have separate instances, not one big entwined Gordian 
knot.

 Now if 
only we had some test to tell which compressors have the best language 
models...
Huh? By definition, the compressor with the best 
language model is the one with the highest compression ratio.


  - Original Message ----- 
  From: 
  Matt 
  Mahoney 
  To: agi@v2.listbox.com 
  Sent: Tuesday, August 15, 2006 3:54 
  PM
  Subject: Re: Mahoney/Sampo: [agi] Marcus 
  Hutter's lossless compression of human knowledge prize
  
  You 
  could use Keogh's compression dissimilarity measure to test for 
  inconsistency.http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf 
  CDM(x,y) = C(xy)/(C(x)+C(y)).where x and y are strings, and C(x) means 
  the compressed size of x (lossless). The measure ranges from about 0.5 
  if x = y to about 1.0 if x and y do not share any information. 
  Then, CDM("it is hot", "it is very warm")  CDM("it is hot", 
  "it is cold").assuming your compressor uses a good language 
  model.Now if only we had some test to tell which compressors have the 
  best language models...
  -- Matt Mahoney, [EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]