Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
On Sun, Aug 2, 2020 at 1:58 AM Ben Goertzel  wrote:

> ...
> ...I also think that the search for concise
> abstract models is another part of what's needed...
>

It depends how you define "concise abstract model". Even maths has an
aspect of contradiction. What does Chaitin call his measure of randomness
in maths..? Chaitin's Omega. And he sees this random character correctly as
a power. A feature not a bug.

Chaitin: "Incompleteness is the first step toward a mathematical theory of
creativity, a bridge between pure mathematics and biology."

http://inference-review.com/article/an-algorithmic-god

So maths has this contradictory truth aspect too. But once you resolve that
with axioms, maths is concise and abstract, yes. You can build a concise
abstract maths on top of that. And sure, it's desirable to do that.


> And I don't think GPT3 is doing this "constantly unpacking structure
> by combining observations, billions of features of it." in the right
> way for AGI ...


No, they're not doing it in the right way. And yes, they will need to
relate it to sensory data to ground words in their turn, in the sensory
world.
That might be closer to the grounding you want.

In particular I don't think string transformations will translate to a
dynamical model. There might be some trick. But likely they'll need to
change their relational principles again. Difficult to find energy minima
of string transformations in real time.

I think the right structuring model will turn out to be causal invariants,
revealed by setting a sequence network oscillating. That should be dead
simple. And tantalisingly suggesting a role for oscillations to match the
observation of "binding by synchrony."

But nobody has ears for that. String transformations are currently where
it's at.


> honestly Rob your own ideas feel way closer to the
> mark...


Thanks. And yes I don't think OpenAI are theoretically close to what I'm
suggesting. You're right.

At best I think they are backing into it, blindly, while looking the other
way. By accident, without theory. They don't have a theory of meaning. They
stumbled into string transformations while trying to enhance RNNs. Now
they're stumbling into more and more parameters, without knowing why they
need to.

But if that's where they are, that's the context I have to talk to. I have
to find what I can of value in that and try to show how I believe it can be
better.

What I can find of value is evidence that the more parameters you generate
the better your model becomes - nobody knows why, I say resolution of
contradiction in context. And secondly that those parameters can be easily
generated from principled relationships between parts - even though they
think you have to list them all in advance.

My own theory is that meaning contradicts. That grasping this has been the
problem for 50 years. (Caused Chomsky to reject learned grammar in the '50s
for instance, and redirect linguistics in the direction of innate
parameters for 60 years!) And to properly capture this contradictory
meaning in context we must constantly generate meaning/parameters from
simple meaningful relationships between parts: causal invariants, string
permutations, transformations... But constantly generating it. A dynamical
system.

Unfortunately the gap between that and the current state-of-the-art is too
great. Unless you can find 9/10 overlap with what people are already doing,
they won't listen. So my best bet is to suggest how folks working on stuff
like GPT-3 might improve what they've done.

My answer is that they might think of calculating their "parameters" in
real time. Not trying to enumerate all infinitely billions of them
beforehand.

But what they are doing is not "right". It's lame that they blow 12M trying
to list every possible meaningful construction in advance. But currently
GPT-3, with its 175 billion parameters and growing, is the best evidence I
have that we need to embrace contradiction, and calculate
meaningful parameters all the time, in a "symbolic network dynamics", as
we've discussed before.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M3eb3c66fee749efaf83a3994
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread immortal . discoveries
"If you could predict the next word.."
"any information and therefore it wouldn't be said.."
And how does that help prediction? It doesn't.

No, use common words when you write books and replies. It helps even the author.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M1db6ec16add0783c181d16f9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Alan Grimes via AGI
immortal.discover...@gmail.com wrote:
> I don't know how you guys have been working on AGI for 30+ years and
> still can't say anything about how to actually predict the next word
> clearly using common words and using few words so any audience can
> quickly learn what you know. Permalink
> 
>


Hey, Wheatley! Use some logic here! If you could predict the next word
then it would be equivalent to saying that the word did not communicate
any information and therefore it wouldn't be said, it would be optimized
out of the language.

Human languages are optimized for the normal human processing rate with
some extra vocabulary for super smart ppl and limited, but functional,
subsets for people who still think it's ok to vote for a Democrat...

In general, the information density of spoken language is on the order
of 1 bit per word. Go look up the work of Claude Shanon (iirc) for more
inf0z.

-- 
The vaccine is a LIE. 
#TheHonklerIsReal

Powers are not rights.


--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M9326983743b6a4db4b67ff01
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread immortal . discoveries
I don't know how you guys have been working on AGI for 30+ years and still 
can't say anything about how to actually predict the next word clearly using 
common words and using few words so any audience can quickly learn what you 
know.

Why can't you say your AGI like this below? They all build on each other too. 
Any layman can get it instantly. It's "beautiful", no nonsense or timewasting.
https://justpaste.it/9h1qp
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mf566011a3fa6e393620aa4e1
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
On Sat, Aug 1, 2020 at 7:08 PM Matt Mahoney  wrote:

>
> On Fri, Jul 31, 2020, 10:00 PM Ben Goertzel  wrote:
>
>> I think "mechanisms for how to predict the next word" is the wrong
>> level at which to think about the problem, if AGI is your interest...
>>
>
> Exactly. The problem is to predict the next bit. I mean my interest is in
> compression, but you still have to solve high level language modeling.
> Compression is not well suited to solving other aspects of AGI like
> robotics and vision where the signal is dominated by noise.
>

It doesn't matter if predicting the next word is the right level to think
about a given problem Matt. What matters is that this is the first time the
symbol grounding problem has been solved for any subset of cognition. For
any problem. This is the first.

I'm not sure Ben thinks it has been solved. He seems to think words are
still detached from their meaning in some important way. I disagree. I
think these GPT-x features are attaching words to meaning.

Perhaps we need a more powerful representation for that meaning. Something
like a hypergraph no doubt. Something that will be populated by relating
text to richer sensory experience, surely. But the grounding is being done,
and this shows us how it can be done. How symbols can be related to
observation.

That's a big thing. And it is also a big thing that the way it has been
solved is by using billions of parameters calculated from simple relational
principles. So not solved by finding some small Holy Grail set of
parameters in one-to-one correspondence with the world in some way, but by
billions of simple parameters formed by combining observations. And
seemingly no limit to how many you need. It matters that it turned out
there appears to be no limit on the number of useful parameters. And it
matters that these limitless numbers of parameters can be calculated from
simple relational principles.

This suggests that the solution to the grounding problem is firstly through
limitless numbers of parameters which can resolve contradictions through
context. But importantly that these limitless numbers of parameters can be
calculated from simple relational principles.

Given this insight it can open the door to symbol grounding for all kinds
of cognitive structures. Personally I think causal invariance will be a big
one. The solution for language it would seem. Grammar, anyway. I think for
vision too. But there may be others. Different forms of analogy I don't
doubt. But all grounded in limitless numbers of parameters which can
resolve contradictions through context. And those limitless numbers of
parameters all calculated from simple relational principles.

Another way to look at this is to say it suggests to us the solution to the
symbol grounding problem turned out to be an expansion on observation, not
a compression.

You can go on thinking the solution is to find some sanctified Holy Grail
small set of parameters. A God given kernel of cognition. But meanwhile
what is working is just constantly unpacking structure by combining
observations, billions of features of it. The number is the thing. More
than we imagined. And contradicting but resolved in context. Moving first
to networks, then to more and more parameters over networks. That is what
is actually working. Allowing the network to blow out and generate more and
more billions of parameters, which can resolve contradiction with context.

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Md4d2a1a723ce7c2afad4db23
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Matt Mahoney
On Fri, Jul 31, 2020, 10:00 PM Ben Goertzel  wrote:

> I think "mechanisms for how to predict the next word" is the wrong
> level at which to think about the problem, if AGI is your interest...
>

Exactly. The problem is to predict the next bit. I mean my interest is in
compression, but you still have to solve high level language modeling.
Compression is not well suited to solving other aspects of AGI like
robotics and vision where the signal is dominated by noise.


--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Md1fa276939b4438a9c71714c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Rob Freeman
How many billion parameters do PLN and TLCG have?

Applications of category theory by Coecke, Sadrzadeh, Clark and others in
the '00s are probably also formally correct.

As were applications of the maths of quantum mechanics. Formally. Does
Dominic Widdows still have that conference? Indeterminacy formalized in
different ways.

But saying stuff is fuzzy, top down, is not the same as resolving that
fuzziness, bottom up.

Natural language grammars drifted to probabilistic frameworks in the '90s.
Never worked worth a damn. That's why Bengio moved to NLM's in the first
place (2003?) Networks, better because... more parameters?

Now, only 20 years later... GPT-3 suggests learning parameters even in the
billions seems to improve things even more. Insight! It seems like there's
no limit to how many parameters are useful. Only problem is processing time
to list them all beforehand It is obvious that you need to list them
all beforehand... Right?

You can imagine how OpenAI, or some other team, may finally stumble on the
answer. They'll be pinched for budget, and finally decide it is stupid to
spend $12M calculating 175 billion parameters in advance. When most of them
will never be used. At some point some bright spark will suggest they
calculate relevant parameters at run time.

When they do that they may find they stop worrying about how many
parameters they have. Because they will just be focused on finding
the parameters they need, when they need them.

The actual parameters will be found by some form of existing known
meaningful relationship: permutation, transform, causal invariance... Easy
stuff. Done now. Just the insight will be to do it at run time so they
don't need to worry about how many there are (they may even be surprised to
find they need to be able to contradict.)

>From that point symbol binding will be solved, and they'll be free to build
any kind of logic they want on top.

-R

On Sat, Aug 1, 2020 at 3:38 PM Ben Goertzel  wrote:

> Contradictions are an interesting and important topic...
>
> PLN logic is paraconsistent, which Curry-Howard-corresponds to a sort
> of gradual typing
>
> Intuitionistic logic  maps into Type Logical Categorial Grammar (TLCG)
> and such; paraconsistent logic would map into a variant of TLCG in
> which there could be statements with multiple contradictory
> parses/interpretations
>
> In short formal grammar is not antithetical to contradictions at the
> level of syntax, semantics or pragmatics
>
> It is true that GPT3 can capture contradictory and ambiguous aspects
> of language.  However, capturing these without properly drawing
> connections btw abstract patterns and concrete instances, doesn't get
> you very far and isn't particularly a great direction IMO
>
> ben

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M6fd5ed33f971d3ffc5e24cf9
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-08-01 Thread Ben Goertzel
Contradictions are an interesting and important topic...

PLN logic is paraconsistent, which Curry-Howard-corresponds to a sort
of gradual typing

Intuitionistic logic  maps into Type Logical Categorial Grammar (TLCG)
and such; paraconsistent logic would map into a variant of TLCG in
which there could be statements with multiple contradictory
parses/interpretations

In short formal grammar is not antithetical to contradictions at the
level of syntax, semantics or pragmatics

It is true that GPT3 can capture contradictory and ambiguous aspects
of language.  However, capturing these without properly drawing
connections btw abstract patterns and concrete instances, doesn't get
you very far and isn't particularly a great direction IMO

ben

On Fri, Jul 31, 2020 at 8:25 PM Rob Freeman  wrote:
>
> Ben,
>
> By examples do you mean like array reversal in your article?
>
> I agree. This problem may not be addressed by their learning paradigm at all.
>
> But I disagree this has been the biggest problem for symbol grounding.
>
> I think the biggest problem for symbol grounding has been ambiguity. Manifest 
> in language.
>
> So I agree GPT-3 may not be capturing necessary patterns for the kind of 
> reason used in array reversal etc. But I disagree that this kind of reasoning 
> has been the biggest problem for symbol grounding.
>
> Where GPT-3 may point the way is by demonstrating a solution to the ambiguity 
> problem.
>
> That solution may be hidden. They may have stumbled on to the solution simply 
> by virtue of the fact that they have no theory at all! No preconceptions.
>
> I would contrast this with traditional grammar learning. Which does have 
> preconceptions. Traditional grammar learning starts with the preconception 
> that grammar will not contradict. The GPT-x algorithm may not have this 
> expectation. So they may be capturing contradictions and indexing them on 
> context, by accident.
>
> So that's my thesis. The fundamental problem which has been holding us back 
> for symbol grounding is that meaning can contradict. A solution to this, even 
> by accident (just because they had no theory at all?) may still point the way.
>
> And the way it points in my opinion is towards infinite parameters. 
> "Parameters" constantly being generated (and contradiction is necessary for 
> that, because you need to be able to interpret data multiple ways in order to 
> have your parameters constantly grow in number 2^2^2^2)
>
> Grok that problem - contradictions inherent in human meaning - and it will be 
> a piece of cake to build the particular patterns you need for abstract 
> reasoning on top of that. Eliza did it decades ago. The problem was it 
> couldn't handle ambiguity.
>
> -Rob
>
> On Sat, Aug 1, 2020 at 9:40 AM Ben Goertzel  wrote:
>>
>> Rob, have you looked at the examples cited in my article, that I
>> linked here?   Seeing this particular sort of stupidity from them,
>> it's hard to see how these networks would be learning the same sorts
>> of "causal invariants" as humans are...
>>
>> Transformers clearly ARE a full grammar learning architecture, but in
>> a non-AGI-ish sense.  They are learning the grammar of the language
>> underlying their training corpus, but mixed up in a weird and
>> non-human-like way with so many particulars of the corpus.
>>
>> Humans also learn the grammar of their natural languages mixed up with
>> the particulars of the linguistic constructs they've encountered --
>> but the "subtle" point (which obviously you are extremely capable to
>> grok) is that the mixing-up of abstract grammatical patterns with
>> concrete usage patterns in human minds is of a different nature than
>> the mixing-up of abstract grammatical patterns with concrete usage
>> patterns in GPT3 and other transformer networks.   The human form of
>> mixing-up is more amenable to appropriate generalization.
>>
>> ben
>
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink



-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to
live, mad to talk, mad to be saved, desirous of everything at the same
time, the ones who never yawn or say a commonplace thing, but burn,
burn, burn like fabulous yellow roman candles exploding like spiders
across the stars.” -- Jack Kerouac

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mf7acad4918d160262f7b65f3
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
Ben,

By examples do you mean like array reversal in your article?

I agree. This problem may not be addressed by their learning paradigm at
all.

But I disagree this has been the biggest problem for symbol grounding.

I think the biggest problem for symbol grounding has been ambiguity.
Manifest in language.

So I agree GPT-3 may not be capturing necessary patterns for the kind of
reason used in array reversal etc. But I disagree that this kind of
reasoning has been the biggest problem for symbol grounding.

Where GPT-3 may point the way is by demonstrating a solution to the
ambiguity problem.

That solution may be hidden. They may have stumbled on to the solution
simply by virtue of the fact that they have no theory at all! No
preconceptions.

I would contrast this with traditional grammar learning. Which does have
preconceptions. Traditional grammar learning starts with the preconception
that grammar will not contradict. The GPT-x algorithm may not have this
expectation. So they may be capturing contradictions and indexing them on
context, by accident.

So that's my thesis. The fundamental problem which has been holding us back
for symbol grounding is that meaning can contradict. A solution to this,
even by accident (just because they had no theory at all?) may still point
the way.

And the way it points in my opinion is towards infinite parameters.
"Parameters" constantly being generated (and contradiction is necessary for
that, because you need to be able to interpret data multiple ways in order
to have your parameters constantly grow in number 2^2^2^2)

Grok that problem - contradictions inherent in human meaning - and it will
be a piece of cake to build the particular patterns you need for abstract
reasoning on top of that. Eliza did it decades ago. The problem was it
couldn't handle ambiguity.

-Rob

On Sat, Aug 1, 2020 at 9:40 AM Ben Goertzel  wrote:

> Rob, have you looked at the examples cited in my article, that I
> linked here?   Seeing this particular sort of stupidity from them,
> it's hard to see how these networks would be learning the same sorts
> of "causal invariants" as humans are...
>
> Transformers clearly ARE a full grammar learning architecture, but in
> a non-AGI-ish sense.  They are learning the grammar of the language
> underlying their training corpus, but mixed up in a weird and
> non-human-like way with so many particulars of the corpus.
>
> Humans also learn the grammar of their natural languages mixed up with
> the particulars of the linguistic constructs they've encountered --
> but the "subtle" point (which obviously you are extremely capable to
> grok) is that the mixing-up of abstract grammatical patterns with
> concrete usage patterns in human minds is of a different nature than
> the mixing-up of abstract grammatical patterns with concrete usage
> patterns in GPT3 and other transformer networks.   The human form of
> mixing-up is more amenable to appropriate generalization.
>
> ben

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M89e52548cccd0a556249a9e8
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread immortal . discoveries
All these papers confuse it and elongate itits so simple!
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Ma754039104df6f791b6decd6
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread immortal . discoveries
I want the actual freakin reason why the next word is predicted, for example:

Bark is seen follow dog 44 times (dog bark, dog bark...), sleep 4 times (dog 
sleep)so dog>bark is probable
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M206278ecbc50400f5876d44b
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread immortal . discoveries
Yous can't even explain how to predict the next word, little own clearly. Go 
on, tell me how, we will start from there.

Above I explained how to do some pretty ok prediction. Brains are basically 
made of syntax and semantics.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M25b3187f009d464b3fd8fd1c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Alan Grimes via AGI
OMG, Goertzel is actually on point... yeah, Wheatley can be frustrating
in a number of ways. I think I got to his core motivation, which I don't
find objectionable in any significant way but he is still quite
solipsistic about his intention to goo the planet and seems to need to
develop his writing skillz more.

Ben Goertzel wrote:
> I think "mechanisms for how to predict the next word" is the wrong
> level at which to think about the problem, if AGI is your interest...
>
> On Fri, Jul 31, 2020 at 6:47 PM  wrote:
>> None of yous are giving a list of mechanisms for how to predict the next 
>> word like I did above. You need to give a clear explanation with a clear 
>> example. And only use words that others know, syntactics is kinda a 
>> bad-word. Frequency is a better word.
>> Artificial General Intelligence List / AGI / see discussions + participants 
>> + delivery options Permalink


-- 
The vaccine is a LIE. 
#TheHonklerIsReal

Powers are not rights.


--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M3f14b4e8b7605638fbb218fb
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Ben Goertzel
I think "mechanisms for how to predict the next word" is the wrong
level at which to think about the problem, if AGI is your interest...

On Fri, Jul 31, 2020 at 6:47 PM  wrote:
>
> None of yous are giving a list of mechanisms for how to predict the next word 
> like I did above. You need to give a clear explanation with a clear example. 
> And only use words that others know, syntactics is kinda a bad-word. 
> Frequency is a better word.
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink



-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to
live, mad to talk, mad to be saved, desirous of everything at the same
time, the ones who never yawn or say a commonplace thing, but burn,
burn, burn like fabulous yellow roman candles exploding like spiders
across the stars.” -- Jack Kerouac

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M18bfc6480ecbf13a0301093a
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread immortal . discoveries
None of yous are giving a list of mechanisms for how to predict the next word 
like I did above. You need to give a clear explanation with a clear example. 
And only use words that others know, syntactics is kinda a bad-word. Frequency 
is a better word.
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Me616f5b51e841941bd1cbbaf
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Ben Goertzel
Rob, have you looked at the examples cited in my article, that I
linked here?   Seeing this particular sort of stupidity from them,
it's hard to see how these networks would be learning the same sorts
of "causal invariants" as humans are...

Transformers clearly ARE a full grammar learning architecture, but in
a non-AGI-ish sense.  They are learning the grammar of the language
underlying their training corpus, but mixed up in a weird and
non-human-like way with so many particulars of the corpus.

Humans also learn the grammar of their natural languages mixed up with
the particulars of the linguistic constructs they've encountered --
but the "subtle" point (which obviously you are extremely capable to
grok) is that the mixing-up of abstract grammatical patterns with
concrete usage patterns in human minds is of a different nature than
the mixing-up of abstract grammatical patterns with concrete usage
patterns in GPT3 and other transformer networks.   The human form of
mixing-up is more amenable to appropriate generalization.

ben


On Fri, Jul 31, 2020 at 6:33 PM Rob Freeman  wrote:
>
> On Sat, Aug 1, 2020 at 3:52 AM  wrote:
>>
>> ...
>> Semantics:
>> If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones 
>> they don't share are shared as well. So you see cat ate, cat ran, cat ran, 
>> cat jumped, cat jumped, cat licked..and dog ate, dog ran, dog ran. 
>> Therefore, probably the predictions not shared could be shared as well, so 
>> maybe 'dog jumped' is a good prediction.
>
>
> Agree with you on this. This is the basis of all grammar learning.
>
> I now believe it may equate to "causal invariants" (same contexts), and so 
> possibly learned by position "transforms" or permutations.
>
> So with "transformers" the RNN guys may have stumbled on a true sense of 
> meaning.
>
> Added to that, the fact that abandoning the original RNN model made it 
> possible to learn hierarchy, may mean GPT-3 is now learning "grammar", with 
> hierarchies and all.
>
> So transformers may equate to a full grammar learning architecture.
>
> It's even possible that because they have no guiding theory, they may be 
> allowing contradictions in their parameters in some way. That would be a big 
> thing from my point of view. I think the doctrinal rejection of contradiction 
> is what is holding back those who formally attempt to learn grammar. Like 
> OpenCog's own historical grammar learning projects.
>
> If GPT-3 is learning grammatical forms which contradict according to context, 
> the only remaining problem from the point of view of my dogma would be that 
> it is trying to learn, not generate, grammar using this (causal invariant, 
> transformer) principle. I think it needs to generate these 175+ parameters on 
> the fly, not try to enumerate them all beforehand at a cost of $12M (Geoff 
> Hinton suggests end the search at 4.398 trillion = 2^42 :-)
>
> -Rob
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink



-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to
live, mad to talk, mad to be saved, desirous of everything at the same
time, the ones who never yawn or say a commonplace thing, but burn,
burn, burn like fabulous yellow roman candles exploding like spiders
across the stars.” -- Jack Kerouac

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M9843aab216f1ab32192d5101
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
On Sat, Aug 1, 2020 at 3:52 AM  wrote:

> ...
> Semantics:
> If 'cat' and 'dog' both share 50% of the same contexts, then maybe the
> ones they don't share are shared as well. So you see cat ate, cat ran, cat
> ran, cat jumped, cat jumped, cat licked..and dog ate, dog ran, dog ran.
> Therefore, probably the predictions not shared could be shared as well, so
> maybe 'dog jumped' is a good prediction.
>

Agree with you on this. This is the basis of all grammar learning.

I now believe it may equate to "causal invariants" (same contexts), and so
possibly learned by position "transforms" or permutations.

So with "transformers" the RNN guys may have stumbled on a true sense of
meaning.

Added to that, the fact that abandoning the original RNN model made it
possible to learn hierarchy, may mean GPT-3 is now learning "grammar", with
hierarchies and all.

So transformers may equate to a full grammar learning architecture.

It's even possible that because they have no guiding theory, they may be
allowing contradictions in their parameters in some way. That would be a
big thing from my point of view. I think the doctrinal rejection of
contradiction is what is holding back those who formally attempt to learn
grammar. Like OpenCog's own historical grammar learning projects.

If GPT-3 is learning grammatical forms which contradict according to
context, the only remaining problem from the point of view of my dogma
would be that it is trying to learn, not generate, grammar using this
(causal invariant, transformer) principle. I think it needs to generate
these 175+ parameters on the fly, not try to enumerate them all beforehand
at a cost of $12M (Geoff Hinton suggests end the search at 4.398 trillion =
2^42 :-)

-Rob

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mee5f7150dd41c23eb94f5620
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Rob Freeman
I was interested to learn that transformers have now completely abandoned
the RNN aspect, and model everything as sequence "transforms" or
re-orderings.

That makes me wonder if some of the theory does not converge on work I like
by Sergio Pissanetzky, which uses permutations of strings to derive
meaningful objects:

"Structural Emergence in Partially Ordered Sets is the Key to Intelligence"
http://sergio.pissanetzky.com/Publications/AGI2011.pdf

Also interesting because Pissanetzky's original motivation was refactoring
code, and one of the most impressive demonstrations to come out of GPT-3
has been the demo which was created to express the "meaning" of natural
language in javascript.

This could give a sense in which transformers are actually stumbling on
true meaning representations.

-Rob

On Sat, Aug 1, 2020 at 3:45 AM Ben Goertzel  wrote:

> What is your justification/reasoning behind saying
>
> "However GPT-3 definitely is close-ish to AGI, many of the mechanisms
> under the illusive hood are AGI mechanisms."
>
> ?
>
> I don't see it that way at all...

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mc9f1fbb6be9850b9bab3d990
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Mike Archbold
There has to be a theory of understanding, reasoning, and judging at a
minimum underlying an aspiring AGI design. There is always going to be
a certain trick bag in any AI. If it looks like it is mostly a bag of
tricks, even though they might be REALLY REALLY GOOD tricks, it won't
get you to general AI. But, some people will believe it is AGI.

On 7/31/20, immortal.discover...@gmail.com
 wrote:
> Because it seems GPT-2/3 must be using several mechanisms like the ones that
> follow else it has no chance at predicting well:
> 
> P.S. Attention Heads isn't listed below, that's an important one, it can ex.
> predict a last name accurately by only looking at certain words regardless
> of all others ex. "[Jen Cath] is a girl who [has a mom] named [Tam]
> **Cath**" Tasks=something to do with where it looks, in which order,
> manipulation, etc...
> 
> ---Compressors/MIXERS---
> 
> Syntactics:
> Intro: Letters, words, and phrases re-occur in text. AI finds such patterns
> in data and **mixes** them. We don't store the same letter or phrase twice,
> we just update connection weights to represent frequencies.
> Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep.
> My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we
> pay Attention to just 'Dogs' and require an exact memory match, the possible
> predicted futures and their probabilities (frequencies) are 'eat' 50% and
> 'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict
> 'Bark' 100%. The matched neuron's parent nodes receive split energy from the
> child match.
> 
> BackOff:
> A longer match considers more information but has very little experience,
> while a short match has most experience but little context. A summed **mix**
> predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and
> blend the 2 sets of predictions to get ex. 'eat' 40% and 'Bark' 60%.
> 
> Semantics:
> If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones
> they don't share are shared as well. So you see cat ate, cat ran, cat ran,
> cat jumped, cat jumped, cat licked..and dog ate, dog ran, dog ran.
> Therefore, probably the predictions not shared could be shared as well, so
> maybe 'dog jumped' is a good prediction. This helps prediction lots, it lets
> you match a given prompt to many various memories that are similar worded.
> Like the rest above, you mix these, you need not store all seen sentence
> from your experiences. Resulting in fast, low-storage, brain. Semantics
> looks at both sides of a word or phrase, and closer items impact it's
> meaning more.
> 
> Byte Pair Encoding:
> Take a look on Wikipedia, it is really simple and can compress a hierarchy
> too. Basically you just find the most common low level pair ex. st, etc,
> then you find the next higher level pair made of those ex. st+arit
> segments text well showing its building blocks.
> 
> More Data:
> Literally just feeding the hierarchy/ heterarchy more data improves its
> prediction accuracy of what word/ building block usually comes next in
> sequence. More data alone improves intelligence, it's actually called
> "gathering intelligence". It does however have slow down at some point and
> requires other mechanisms, like the ones above.
> 
> I have ~16 of these that all merge data to improve prediction You merge
> to e-merge insights
> 
> Any AGI will have these

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M04b16f5daad12f3c38d0af2c
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread immortal . discoveries
Because it seems GPT-2/3 must be using several mechanisms like the ones that 
follow else it has no chance at predicting well:

P.S. Attention Heads isn't listed below, that's an important one, it can ex. 
predict a last name accurately by only looking at certain words regardless of 
all others ex. "[Jen Cath] is a girl who [has a mom] named [Tam] **Cath**" 
Tasks=something to do with where it looks, in which order, manipulation, 
etc...

---Compressors/MIXERS---

Syntactics:
Intro: Letters, words, and phrases re-occur in text. AI finds such patterns in 
data and **mixes** them. We don't store the same letter or phrase twice, we 
just update connection weights to represent frequencies.
Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep. My 
Dogs Bark." in the past, and is prompted with the input "My Dogs" and we pay 
Attention to just 'Dogs' and require an exact memory match, the possible 
predicted futures and their probabilities (frequencies) are 'eat' 50% and 
'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict 'Bark' 
100%. The matched neuron's parent nodes receive split energy from the child 
match.

BackOff:
A longer match considers more information but has very little experience, while 
a short match has most experience but little context. A summed **mix** predicts 
better, we look in memory at what follows 'Dogs' and 'My Dogs' and blend the 2 
sets of predictions to get ex. 'eat' 40% and 'Bark' 60%.

Semantics:
If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones 
they don't share are shared as well. So you see cat ate, cat ran, cat ran, cat 
jumped, cat jumped, cat licked..and dog ate, dog ran, dog ran. Therefore, 
probably the predictions not shared could be shared as well, so maybe 'dog 
jumped' is a good prediction. This helps prediction lots, it lets you match a 
given prompt to many various memories that are similar worded. Like the rest 
above, you mix these, you need not store all seen sentence from your 
experiences. Resulting in fast, low-storage, brain. Semantics looks at both 
sides of a word or phrase, and closer items impact it's meaning more.

Byte Pair Encoding:
Take a look on Wikipedia, it is really simple and can compress a hierarchy too. 
Basically you just find the most common low level pair ex. st, etc, then you 
find the next higher level pair made of those ex. st+arit segments text 
well showing its building blocks.

More Data:
Literally just feeding the hierarchy/ heterarchy more data improves its 
prediction accuracy of what word/ building block usually comes next in 
sequence. More data alone improves intelligence, it's actually called 
"gathering intelligence". It does however have slow down at some point and 
requires other mechanisms, like the ones above.

I have ~16 of these that all merge data to improve prediction You merge to 
e-merge insights

Any AGI will have these
--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-M9a05153b318f3a0b029f98bf
Delivery options: https://agi.topicbox.com/groups/agi/subscription


Re: [agi] Re: GPT3 -- Super-cool but not a path to AGI (

2020-07-31 Thread Ben Goertzel
What is your justification/reasoning behind saying

"However GPT-3 definitely is close-ish to AGI, many of the mechanisms
under the illusive hood are AGI mechanisms."

?

I don't see it that way at all...

On Fri, Jul 31, 2020 at 12:39 PM  wrote:
>
> Follows is everything I got out of that long eyed post:
>
> "This is really, really weird in terms of the way human mind approach 
> arithmetic, right?   For a human who knows how to do 2-3 digit arithmetic, 
> the error rate at 4-5 digit arithmetic — when given time and motivation for 
> doing the arithmetic problems — is going to be either 0% or very close to 0%, 
> or else way closer to 100%.   Once a human learns the basic algorithms of 
> arithmetic, they can apply them at any size, unless they make sloppy errors 
> or just run out of patience."
> Thank you. I agree.
> BTW when humans don't know the answer, they use small things they do know to 
> carry over numbers and solve bigger equations. This requires a Task Pattern 
> learnt. GPT learns Task Patterns, not sure why it didn't here :) I think 
> nowhere in the dataset was how to do arithmetic and/or nor the correct 
> mechanisms to carry out sequences of complex tasks.
>
> You mention:
> Q: Reverse the following array: [1, 3, 5, 6, 10, 4, 2, 77]
> A: [10, 6, 4, 2, 77, 3, 5, 1]
> and
> Q: How many eyes does my foot have?
> A: Your foot has two eyes.
> Again, this is just Tasks. The latter question is a simple pattern question 
> thing that is a sad fault, this can be fixed by some simple trick surely.
>
> "Given all the ridiculous wastes of resources in modern society, it’s hard to 
> get too outraged at the funds spent on GPT3"
> Totally agree.
> "if one focuses on the fairly limited pool of resources currently being spent 
> on advanced AI systems without direct commercial application, one wonders 
> whether we’d be better off to focus more of this pool on fundamental 
> innovations in representation, architecture, learning, creativity, empathy 
> and human-computer interaction, rather than on scaling up transformers bigger 
> and bigger."
> This is true. And we can see the accuracy curve, we knew 100x bigger gpt2 
> would result in little improvement, why they do this. They're selling an API 
> is why maybe.
>
> However GPT-3 definitely is close-ish to AGI, many of the mechanisms under 
> the illusive hood are AGI mechanisms. Like turtle > man, the limbs are there, 
> the eyes, the head, the but, the spine, the lungs, just doesn't look like man 
> so muchbut it's so frikin close!
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink



-- 
Ben Goertzel, PhD
http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to
live, mad to talk, mad to be saved, desirous of everything at the same
time, the ones who never yawn or say a commonplace thing, but burn,
burn, burn like fabulous yellow roman candles exploding like spiders
across the stars.” -- Jack Kerouac

--
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T21c073d3fe3faef0-Mb642e25f199626d840ca817b
Delivery options: https://agi.topicbox.com/groups/agi/subscription