Re: [agi] Epineuronal programming

2009-01-07 Thread Steve Richfield
Abram,

On 1/6/09, Abram Demski abramdem...@gmail.com wrote:

 Well, I *still* think you are wasting your time with flat
 (propositional) learning.


I'm not at all sure that I understand what you are saying here, so some
elaboration is probably in order.

I'm not saying there isn't still progress to
 be made in this area, but I just don't see it as an area where
 progress is critical.


My guess is that the poor performance of non dp/dt methods is depressing, so
everyone wants to look elsewhere. Damn that yellow stuff, I'm looking for
SILVER. My hope/expectation is that this field can be supercharged with
dp/dt methods.

The main thing that we can do with propositional
 models when we're dealing with relational data is construct
 markov-models.


By Markov you are referring to successive computation processes, e.g.
layers of neurons, each feeding the next?

Markov models are highly prone to overmatching the
 dataset when they become high-order.


Only because the principal components haven't been accurately sorted out by
dp/dt methods?

So far as I am aware,
 improvements to propositional models mainly improve performance for
 large numbers of variables, since there isn't much to gain with only a
 few variables.


Again, hoping that enough redundancy can deal with the overlapping effects
of things that occur together, a problem generally eliminated by dp/dt
methods.

(FYI, I don't have much evidence to back up that
 claim.)


When I finally get this all wrung out, I'll move onto using Eddie's NN
platform, that ties into web cams and other complex software or input. Then,
we should have lots of real-world testing. BTW, with really fast learning,
MUCH larger models can be simulated on the same computers.

So, I don't think progress on the propositional front directly
 translates to progress on the relational front, except in cases where
 we have astronomical amounts of data to prevent overmatching.


In a sense, dp/dt provides another dimension to sort things out. I am
hoping/expecting that LESS dp/dt data is needed this way than with other
competing methods.

Moreover, we need something more than just markov models!


The BIG question is: Can we characterize what is needed?

The transition to hidden-markov-model is not too difficult if we take
 the approach of hierarchical temporal memory; but this is still very
 simplistic.


Most, though certainly not all elegant solutions are simple. Is dp/dt (and
corollary methods) it or not? THAT is the question.

Any thoughts about dealing with this?


Here, I am hung up on this. Rather than respond in excruciating detail
with a presumption of this, I'll make the following simplistic statement
to get this process started.

Simple learning methods have not worked well for reasons you mentioned
above. The question here is whether dp/dt methods blow past those
limitations in general, and whether epineuronal methods blow past best in
particular.

Are we on the same page here?

Steve Richfield

On Mon, Jan 5, 2009 at 12:42 PM, Steve Richfield
 steve.richfi...@gmail.com wrote:
  Thanks everyone for helping me wring out the whole dp/dt thing. Now for
  the next part of Steve's Theory...
 
  If we look at learning as extracting information from a noisy channel, in
  which the S/N ratio is usually 1, but where the S/N ratio is sometimes
  very high, the WRONG thing to do is to engage in some sort of slow
 averaging
  process as present slow-learning processes do. This especially when dp/dt
  based methods can occationally completely separate (in time) the signal
  from the noise.
 
  Instead, it would appear that the best/fastest/cleanest (from an
 information
  theory viewpoint) way to extract the signal would be to wait for a
  nearly-perfect low-noise opportunity and simply latch on to the
 principal
  component therein.
 
  Of course there will still be some noise present, regardless of how good
 the
  opportunity, so some sort of successive refinement process using future
  opportunities could further trim NN synapses, edit AGI terms, etc. In
  short, I see that TWO entirely different learning mechanisms are needed,
 one
  to initially latch onto an approximate principal component, and a second
 to
  refine that component.
 
  Processes like this have their obvious hazards, like initially failing to
  incorporate a critical synapse/term, and in the process dooming their
  functionality regardless of refinement. Neurons, principal components,
  equations, etc., that turn out to be worthless, or which are refined
 into
  nothingness, would simply trigger another epineuronal reprogramming to
 yet
  another principal component, when a lack of lateral inhibition or other
  AGI-equivalent process detects that something is happening that nothing
 else
  recognizes.
 
  In short, I am proposing abandoning the sorts of slow learning processes
  typical of machine learning, except for use in gradual refinement of
  opportunistic instantly-recognized principal components.
 
  Any 

Re: [agi] Epineuronal programming

2009-01-07 Thread Abram Demski
Steve,

Dp/dt methods do not fundamentally change the space of possible models
(if your initial mathematical claim of equivalence is true). What I am
saying is that that model space is *far* too small. Perhaps you know
some grammar theory? Markov models are not even as expressive as
regular grammars. Hidden markov models are. But there is a long way to
go from there, since that is just the first level of the hierarchy.

 By Markov you are referring to successive computation processes, e.g.
 layers of neurons, each feeding the next?

For sequential data, an Nth-order markov model is a model that
predicts the next item in the sequence from the last N items. These
can be built by making an n-dimensional table, and running through the
data to count what item appears after each occurrence of each n-item
subsequence. Equivalently, an nth-order markov model might store the
probability (/frequency) of each possible sequence of length N+1; in
that case we've got to do some extra calculations to get predictions
out of the model, but mathematically speaking, we've got the same
information in our hands. Markov models can be extended to spatial
data by counting the probabilities of (all possible) squares of some
fixed size. (Circles would work fine too.)

 Markov models are highly prone to overmatching the
 dataset when they become high-order.


 Only because the principal components haven't been accurately sorted out by
 dp/dt methods?

The reason that overmatching becomes a problem is that the size of the
table grows exponentially with N. There is simply not enough data to
fill the table properly. Let's see... where normal methods would give
a variable values 1 or 0, derivatives would allow 1, 0, and -1
(positive change, no change, negative change). So for discrete data,
dp/dt will actually make the tables bigger. This could improve
discrimination for low-order models (similar to the effect if
increasing the order), but it will make overmatching worse for
higher-order models (again, similar to the effect of increasing the
order).

of course, there is an added bonus if the data's regularity really is
represented better by the derivatives.

Come to think of it, it shouldn't be surprising that working in
derivative space is like increasing the order... each unit of
derivative-data represents (the difference between) two units of
normal data.

--Abram

On Wed, Jan 7, 2009 at 1:40 PM, Steve Richfield
steve.richfi...@gmail.com wrote:
 Abram,

 On 1/6/09, Abram Demski abramdem...@gmail.com wrote:

 Well, I *still* think you are wasting your time with flat
 (propositional) learning.


 I'm not at all sure that I understand what you are saying here, so some
 elaboration is probably in order.

 I'm not saying there isn't still progress to
 be made in this area, but I just don't see it as an area where
 progress is critical.


 My guess is that the poor performance of non dp/dt methods is depressing, so
 everyone wants to look elsewhere. Damn that yellow stuff, I'm looking for
 SILVER. My hope/expectation is that this field can be supercharged with
 dp/dt methods.

 The main thing that we can do with propositional
 models when we're dealing with relational data is construct
 markov-models.


 By Markov you are referring to successive computation processes, e.g.
 layers of neurons, each feeding the next?

 Markov models are highly prone to overmatching the
 dataset when they become high-order.


 Only because the principal components haven't been accurately sorted out by
 dp/dt methods?

 So far as I am aware,
 improvements to propositional models mainly improve performance for
 large numbers of variables, since there isn't much to gain with only a
 few variables.


 Again, hoping that enough redundancy can deal with the overlapping effects
 of things that occur together, a problem generally eliminated by dp/dt
 methods.

 (FYI, I don't have much evidence to back up that
 claim.)


 When I finally get this all wrung out, I'll move onto using Eddie's NN
 platform, that ties into web cams and other complex software or input. Then,
 we should have lots of real-world testing. BTW, with really fast learning,
 MUCH larger models can be simulated on the same computers.

 So, I don't think progress on the propositional front directly
 translates to progress on the relational front, except in cases where
 we have astronomical amounts of data to prevent overmatching.


 In a sense, dp/dt provides another dimension to sort things out. I am
 hoping/expecting that LESS dp/dt data is needed this way than with other
 competing methods.

 Moreover, we need something more than just markov models!


 The BIG question is: Can we characterize what is needed?

 The transition to hidden-markov-model is not too difficult if we take
 the approach of hierarchical temporal memory; but this is still very
 simplistic.


 Most, though certainly not all elegant solutions are simple. Is dp/dt (and
 corollary methods) it or not? THAT is the question.

 Any 

[agi] The Smushaby of Flatway.

2009-01-07 Thread Jim Bromer
All of the major AI paradigms, including those that are capable of
learning, are flat according to my definition.  What makes them flat
is that the method of decision making is minimally-structured and they
funnel all reasoning through a single narrowly focused process that
smushes different inputs to produce output that can appear reasonable
in some cases but is really flat and lacks any structure for complex
reasoning.

The classic example is of course logic.  Every proposition can be
described as being either True or False and any collection of
propositions can be used in the derivation of a conclusion regardless
of whether the input propositions had any significant relational
structure that would actually have made it reasonable to draw the
definitive conclusion that was drawn from them.

But logic didn't do the trick, so along came neural networks and
although the decision making is superficially distributed and can be
thought of as being comprised of a structure of layer-like stages in
some variations, the methodology of the system is really just as flat.
 Again anything can be dumped into the neural network and a single
decision making process works on the input through a
minimally-structured reasoning system and output is produced
regardless of the lack of appropriate relative structure in it.  In
fact, this lack of discernment was seen as a major breakthrough!
Surprise, neural networks did not work just like the mind works in
spite of the years and years of hype-work that went into repeating
this slogan in the 1980's.

Then came Genetic Algorithms and finally we had a system that could
truly learn to improve on its previous learning and how did it do
this?  It used another flat reasoning method whereby combinations of
data components were processed according to one simple untiring method
that was used over and over again regardless of any potential to see
input as being structured in more ways than one.  Is anyone else
starting to discern a pattern here?

Finally we reach the next century to find that the future of AI has
already arrived and that future is probabilistic reasoning!  And how
is probabilistic reasoning different?  Well, it can solve problems
that logic, neural networks, genetic algorithms couldn't!  And how
does probabilistic reasoning do this?  It uses a funnel
minimally-structured method of reasoning whereby any input can be
smushed together with other disparate input to produce a conclusion
which is only limited by the human beings who strive to program it!

The very allure of minimally-structured reasoning is that it works
even in some cases where it shouldn't.  It's the hip hooray and bally
hoo of the smushababies of Flatway.

Jim Bromer


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com


Re: [agi] The Smushaby of Flatway.

2009-01-07 Thread Matt Mahoney
Logic has not solved AGI because logic is a poor model of the way people think.

Neural networks have not solved AGI because you would need about 10^15 bits of 
memory and 10^16 OPS to simulate a human brain sized network.

Genetic algorithms have not solved AGI because the computational requirements 
are even worse. You would need 10^36 bits just to model all the world's DNA, 
and even if you could simulate it in real time, it took 3 billion years to 
produce human intelligence the first time.

Probabilistic reasoning addresses only one of the many flaws of first order 
logic as a model of AGI. Reasoning under uncertainty is fine, but you haven't 
solved learning by induction, reinforcement learning, complex pattern 
recognition (e.g. vision), and language. If it was just a matter of writing the 
code, then it would have been done 50 years ago.

-- Matt Mahoney, matmaho...@yahoo.com


--- On Wed, 1/7/09, Jim Bromer jimbro...@gmail.com wrote:

 From: Jim Bromer jimbro...@gmail.com
 Subject: [agi] The Smushaby of Flatway.
 To: agi@v2.listbox.com
 Date: Wednesday, January 7, 2009, 8:23 PM
 All of the major AI paradigms, including those that are
 capable of
 learning, are flat according to my definition.  What makes
 them flat
 is that the method of decision making is
 minimally-structured and they
 funnel all reasoning through a single narrowly focused
 process that
 smushes different inputs to produce output that can appear
 reasonable
 in some cases but is really flat and lacks any structure
 for complex
 reasoning.
 
 The classic example is of course logic.  Every proposition
 can be
 described as being either True or False and any collection
 of
 propositions can be used in the derivation of a conclusion
 regardless
 of whether the input propositions had any significant
 relational
 structure that would actually have made it reasonable to
 draw the
 definitive conclusion that was drawn from them.
 
 But logic didn't do the trick, so along came neural
 networks and
 although the decision making is superficially distributed
 and can be
 thought of as being comprised of a structure of layer-like
 stages in
 some variations, the methodology of the system is really
 just as flat.
  Again anything can be dumped into the neural network and a
 single
 decision making process works on the input through a
 minimally-structured reasoning system and output is
 produced
 regardless of the lack of appropriate relative structure in
 it.  In
 fact, this lack of discernment was seen as a major
 breakthrough!
 Surprise, neural networks did not work just like the mind
 works in
 spite of the years and years of hype-work that went into
 repeating
 this slogan in the 1980's.
 
 Then came Genetic Algorithms and finally we had a system
 that could
 truly learn to improve on its previous learning and how did
 it do
 this?  It used another flat reasoning method whereby
 combinations of
 data components were processed according to one simple
 untiring method
 that was used over and over again regardless of any
 potential to see
 input as being structured in more ways than one.  Is anyone
 else
 starting to discern a pattern here?
 
 Finally we reach the next century to find that the future
 of AI has
 already arrived and that future is probabilistic reasoning!
  And how
 is probabilistic reasoning different?  Well, it can solve
 problems
 that logic, neural networks, genetic algorithms
 couldn't!  And how
 does probabilistic reasoning do this?  It uses a funnel
 minimally-structured method of reasoning whereby any input
 can be
 smushed together with other disparate input to produce a
 conclusion
 which is only limited by the human beings who strive to
 program it!
 
 The very allure of minimally-structured reasoning is that
 it works
 even in some cases where it shouldn't.  It's the
 hip hooray and bally
 hoo of the smushababies of Flatway.
 
 Jim Bromer



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com


Re: [agi] The Smushaby of Flatway.

2009-01-07 Thread Ben Goertzel
  If it was just a matter of writing the code, then it would have been done
 50 years ago.



if proving Fermat's Last theorem was just a matter of doing math, it would
have been done 150 years ago ;-p

obviously, all hard problems that can be solved have already been solved...

???



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com