subject:"AW\: AW\: \[agi\] Language learning \(was Re\: Defining AGI\)"

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

2008-10-23 Thread BillK

On Thu, Oct 23, 2008 at 12:55 AM, Matt Mahoney wrote:


 I suppose you are right. Instead of encoding mathematical rules as a grammar, 
 with enough training
 data you can just code all possible instances that are likely to be 
 encountered. For example, instead
 of a grammar rule to encode the commutative law of addition,

  5 + 3 = a + b = b + a = 3 + 5
 a model with a much larger training data set could just encode instances with 
 no generalization:

  12 + 7 = 7 + 12
  92 + 0.5 = 0.5 + 92
  etc.

 I believe this is how Google gets away with brute force n-gram statistics 
 instead of more sophisticated  grammars. It's language model is probably 
 10^5 times larger than a human model (10^14 bits vs
 10^9 bits). Shannon observed in 1949 that random strings generated by n-gram 
 models of English
 (where n is the number of either letters or words) look like natural language 
 up to length 2n. For a
 typical human sized model (1 GB text), n is about 3 words. To model strings 
 longer than 6 words we
 would need more sophisticated grammar rules. Google can model 5-grams (see
 http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
  ), so it is able to
 generate and recognize (thus appear to understand) sentences up to about 10 
 words.



Gigantic databases are indeed Google's secret sauce.
See:
http://googleresearch.blogspot.com/2008/09/doubling-up.html

Quote:
Monday, September 29, 2008   Posted by Franz Josef Och

Machine translation is hard. Natural languages are so complex and have
so many ambiguities and exceptions that teaching a computer to
translate between them turned out to be a much harder problem than
people thought when the field of machine translation was born over 50
years ago. At Google Research, our approach is to have the machines
learn to translate by using learning algorithms on gigantic amounts of
monolingual and translated data. Another knowledge source is user
suggestions. This approach allows us to constantly improve the
quality of machine translations as we mine more data and
get more and more feedback from users.

A nice property of the learning algorithms that we use is that they
are largely language independent -- we use the same set of core
algorithms for all languages. So this means if we find a lot of
translated data for a new language, we can just run our algorithms and
build a new translation system for that language.

As a result, we were recently able to significantly increase the number of
languages on translate.google.com. Last week, we launched eleven new
languages: Catalan, Filipino, Hebrew, Indonesian, Latvian, Lithuanian, Serbian,
Slovak, Slovenian, Ukrainian, Vietnamese. This increases the
total number of languages from 23 to 34.  Since we offer translation
between any of those languages this increases the number of language
pairs from 506 to 1122 (well, depending on how you count simplified
and traditional Chinese you might get even larger numbers).
-


BillK


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34
Powered by Listbox: http://www.listbox.com

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

2008-10-23 Thread Mark Waser


I have already proved something stronger


What would you consider your best reference/paper outlining your arguments? 
Thanks in advance.


- Original Message - 
From: Matt Mahoney [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Wednesday, October 22, 2008 8:55 PM
Subject: Re: AW: AW: [agi] Language learning (was Re: Defining AGI)



--- On Wed, 10/22/08, Dr. Matthias Heger [EMAIL PROTECTED] wrote:


You make the implicit assumption that a natural language
understanding system will pass the turing test. Can you prove this?


If you accept that a language model is a probability distribution over 
text, then I have already proved something stronger. A language model 
exactly duplicates the distribution of answers that a human would give. 
The output is indistinguishable by any test. In fact a judge would have 
some uncertainty about other people's language models. A judge could be 
expected to attribute some errors in the model to normal human variation.



Furthermore,  it is just an assumption that the ability to
have and to apply
the rules are really necessary to pass the turing test.

For these two reasons, you still haven't shown 3a and
3b.


I suppose you are right. Instead of encoding mathematical rules as a 
grammar, with enough training data you can just code all possible 
instances that are likely to be encountered. For example, instead of a 
grammar rule to encode the commutative law of addition,


 5 + 3 = a + b = b + a = 3 + 5

a model with a much larger training data set could just encode instances 
with no generalization:


 12 + 7 = 7 + 12
 92 + 0.5 = 0.5 + 92
 etc.

I believe this is how Google gets away with brute force n-gram statistics 
instead of more sophisticated grammars. It's language model is probably 
10^5 times larger than a human model (10^14 bits vs 10^9 bits). Shannon 
observed in 1949 that random strings generated by n-gram models of English 
(where n is the number of either letters or words) look like natural 
language up to length 2n. For a typical human sized model (1 GB text), n 
is about 3 words. To model strings longer than 6 words we would need more 
sophisticated grammar rules. Google can model 5-grams (see 
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html ) 
, so it is able to generate and recognize (thus appear to understand) 
sentences up to about 10 words.



By the way:
The turing test must convince 30% of the people.
Today there is a system which can already convince 25%

http://www.sciencedaily.com/releases/2008/10/081013112148.htm


It would be interesting to see a version of the Turing test where the 
human confederate, machine, and judge all have access to a computer with 
an internet connection. I wonder if this intelligence augmentation would 
make the test easier or harder to pass?




-Matthias


 3) you apply rules such as 5 * 7 = 35 - 35 / 7 = 5
but
 you have not shown that
 3a) that a language understanding system
necessarily(!) has
 this rules
 3b) that a language understanding system
necessarily(!) can
 apply such rules

It must have the rules and apply them to pass the Turing
test.

-- Matt Mahoney, [EMAIL PROTECTED]



-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com






---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34
Powered by Listbox: http://www.listbox.com

AW: AW: [agi] Language learning (was Re: Defining AGI)

2008-10-22 Thread Dr. Matthias Heger

You make the implicit assumption that a natural language understanding
system will pass the turing test. Can you prove this?

Furthermore,  it is just an assumption that the ability to have and to apply
the rules are really necessary to pass the turing test.

For these two reasons, you still haven't shown 3a and 3b.

By the way:
The turing test must convince 30% of the people.
Today there is a system which can already convince 25%

http://www.sciencedaily.com/releases/2008/10/081013112148.htm

-Matthias


 3) you apply rules such as 5 * 7 = 35 - 35 / 7 = 5 but
 you have not shown that
 3a) that a language understanding system necessarily(!) has
 this rules
 3b) that a language understanding system necessarily(!) can
 apply such rules

It must have the rules and apply them to pass the Turing test.

-- Matt Mahoney, [EMAIL PROTECTED]





---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34
Powered by Listbox: http://www.listbox.com

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

2008-10-22 Thread Matt Mahoney

--- On Wed, 10/22/08, Dr. Matthias Heger [EMAIL PROTECTED] wrote:

 You make the implicit assumption that a natural language
 understanding system will pass the turing test. Can you prove this?

If you accept that a language model is a probability distribution over text, 
then I have already proved something stronger. A language model exactly 
duplicates the distribution of answers that a human would give. The output is 
indistinguishable by any test. In fact a judge would have some uncertainty 
about other people's language models. A judge could be expected to attribute 
some errors in the model to normal human variation.

 Furthermore,  it is just an assumption that the ability to
 have and to apply
 the rules are really necessary to pass the turing test.
 
 For these two reasons, you still haven't shown 3a and
 3b.

I suppose you are right. Instead of encoding mathematical rules as a grammar, 
with enough training data you can just code all possible instances that are 
likely to be encountered. For example, instead of a grammar rule to encode the 
commutative law of addition,

  5 + 3 = a + b = b + a = 3 + 5

a model with a much larger training data set could just encode instances with 
no generalization:

  12 + 7 = 7 + 12
  92 + 0.5 = 0.5 + 92
  etc.

I believe this is how Google gets away with brute force n-gram statistics 
instead of more sophisticated grammars. It's language model is probably 10^5 
times larger than a human model (10^14 bits vs 10^9 bits). Shannon observed in 
1949 that random strings generated by n-gram models of English (where n is the 
number of either letters or words) look like natural language up to length 2n. 
For a typical human sized model (1 GB text), n is about 3 words. To model 
strings longer than 6 words we would need more sophisticated grammar rules. 
Google can model 5-grams (see 
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
 ), so it is able to generate and recognize (thus appear to understand) 
sentences up to about 10 words. 

 By the way:
 The turing test must convince 30% of the people.
 Today there is a system which can already convince 25%
 
 http://www.sciencedaily.com/releases/2008/10/081013112148.htm

It would be interesting to see a version of the Turing test where the human 
confederate, machine, and judge all have access to a computer with an internet 
connection. I wonder if this intelligence augmentation would make the test 
easier or harder to pass?

 
 -Matthias
 
 
  3) you apply rules such as 5 * 7 = 35 - 35 / 7 = 5
 but
  you have not shown that
  3a) that a language understanding system
 necessarily(!) has
  this rules
  3b) that a language understanding system
 necessarily(!) can
  apply such rules
 
 It must have the rules and apply them to pass the Turing
 test.
 
 -- Matt Mahoney, [EMAIL PROTECTED]


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=117534816-b15a34
Powered by Listbox: http://www.listbox.com

AW: AW: [agi] Language learning (was Re: Defining AGI)

2008-10-21 Thread Dr. Matthias Heger

Andi wrote

This really seems more like arguing that there is no such thing as
AI-complete at all.  That is certainly a possibility.  It could be that
there are only different competences.  This would also seem to mean that
there isn't really anything that is truly general about intelligence,
which is again possible.

No. This arguing shows that there are very basic features which do not imply
necessarily from natural language understanding:

Usage of knowledge.
The example to solve a equation is just one of many examples. 
If you can talk about things this does not imply that you can do things.



I guess one thing we're seeing here is a basic example of mathematics as
having underlying separate mechanisms from other features of language. 
The Lakoff and Nunez talk about subitizing (judging small numbers of
things at a glance) as one core competancy, and counting as another. 
These are things you can see in animals that do not use language.  So,
sure, mathematics could be a separate realm of intelligence.


It is not just mathematics. A natural language understanding system can talk
about shopping. But from this ability you can't prove that it can do
shopping.
There are essential features of intelligence missing in natural language
understanding. And that's the reason why natural language understanding is
not AGI-complete.



Of course, my response to that is that this kind of basic mathematical
ability is needed to understand language.  



This argumentation is nothing else than making a non-AGI-complete system AGI
complete by adding more and more features.

If you suppose for a arbitrary still unsolved problem P that everything is
which is needed to solve AGI is also necessary to solve P then it becomes
trivial that P is AGI-complete. 

But this argumentation is similar to the doubters of AGI who essentially
suppose for an arbitrary given still unsolved problem P that P is not
computable at all.






-Matthias


Matthias wrote:
 Sorry, but this was no proof that a natural language understanding system
 is
 necessarily able to solve the equation x*3 = y for arbitrary y.

 1) You have not shown that a language understanding system must
 necessarily(!) have made statistical experiences on the equation x*3 =y.

 2) you give only a few examples. For a proof of the claim, you have to
 prove
 it for every(!) y.

 3) you apply rules such as 5 * 7 = 35 - 35 / 7 = 5 but you have not shown
 that
 3a) that a language understanding system necessarily(!) has this rules
 3b) that a language understanding system necessarily(!) can apply such
 rules

 In my opinion a natural language understanding system must have a lot of
 linguistic knowledge.
 Furthermore a system which can learn natural languages must be able to
 gain
 linguistic knowledge.

 But both systems do not have necessarily(!) the ability to *work* with
 this
 knowledge as it is essential for AGI.

 And for this reason natural language understanding is not AGI complete at
 all.

 -Matthias



 -Ursprüngliche Nachricht-
 Von: Matt Mahoney [mailto:[EMAIL PROTECTED]
 Gesendet: Dienstag, 21. Oktober 2008 05:05
 An: agi@v2.listbox.com
 Betreff: [agi] Language learning (was Re: Defining AGI)


 --- On Mon, 10/20/08, Dr. Matthias Heger [EMAIL PROTECTED] wrote:

 For instance, I doubt that anyone can prove that
 any system which understands natural language is
 necessarily able to solve
 the simple equation x *3 = y for a given y.

 It can be solved with statistics. Take y = 12 and count Google hits:

 string count
 -- -
 1x3=12 760
 2x3=12 2030
 3x3=12 9190
 4x3=12 16200
 5x3=12 1540
 6x3=12 1010

 More generally, people learn algebra and higher mathematics by induction,
 by
 generalizing from lots of examples.

 5 * 7 = 35 - 35 / 7 = 5
 4 * 6 = 24 - 24 / 6 = 4
 etc...
 a * b = c - c = b / a

 It is the same way we learn grammatical rules, for example converting
 active
 to passive voice and applying it to novel sentences:

 Bob kissed Alice - Alice was kissed by Bob.
 I ate dinner - Dinner was eaten by me.
 etc...
 SUBJ VERB OBJ - OBJ was VERB by SUBJ.

 In a similar manner, we can learn to solve problems using logical
 deduction:

 All frogs are green. Kermit is a frog. Therefore Kermit is green.
 All fish live in water. A shark is a fish. Therefore sharks live in water.
 etc...

 I understand the objection to learning math and logic in a language model
 instead of coding the rules directly. It is horribly inefficient. I
 estimate
 that a neural language model with 10^9 connections would need up to 10^18
 operations to learn simple arithmetic like 2+2=4 well enough to get it
 right
 90% of the time. But I don't know of a better way to learn how to convert
 natural language word problems to a formal language suitable for entering
 into a calculator at the level of an average human adult.

 -- Matt Mahoney, [EMAIL PROTECTED]



 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed:

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

AW: AW: [agi] Language learning (was Re: Defining AGI)

Re: AW: AW: [agi] Language learning (was Re: Defining AGI)

AW: AW: [agi] Language learning (was Re: Defining AGI)

5 matches

Site Navigation

Mail list logo

Footer information