[agi] Flexibility of AI vs. a PC

2007-12-05 Thread William Pearson
One thing that has been puzzling me for a while is, why some people
expect an intelligence to be less flexible than a PC.

What do I mean by this? A PC can have any learning algorithm, bias or
representation of data we care to create. This raises another
question: how are we creating a representation if not copying it from
some sense from our brains? So why do we still create systems that
have fixed representations of the external world, fixed methods of
learning?

Take  the development of echo location in blind people, or the ability
to take visual information from stimulating the tongue. Isn't this
sufficient evidence to suggest we should be trying to make our AIs as
flexible as the most flexible things we know?

 Will Pearson

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72201582-721bf8


Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Mark Waser
Interesting.  Since I am interested in parsing, I read Collin's paper.  It's a 
solid piece of work (though with the stated error percentages, I don't believe 
that it really proves anything worthwhile at all) -- but your 
over-interpretations of it are ridiculous.

You claim that It is actually showing that you can do something roughly 
equivalent to growing neural gas (GNG) in a space with something approaching 
500,000 dimensions, but you can do it without normally having to deal with more 
than a few of those dimensions at one time.  Collins makes no claims that even 
remotely resembles this.  He *is* taking a deconstructionist approach (which 
Richard and many others would argue vehemently with) -- but that is virtually 
the entirety of the overlap between his paper and your claims.  Where do you 
get all this crap about 500,000 dimensions, for example?

You also make statements that are explicitly contradicted in the paper.  For 
example, you say But there really seem to be no reason why there should be any 
limit to the dimensionality of the space in which the Collin's algorithm works, 
because it does not use an explicit vector representation while his paper 
quite clearly states Each tree is represented by an n dimensional vector where 
the i'th component counts the number of occurences of the i'th tree fragment. 
(A mistake I believe you made because you didn't understand the prevceding 
sentence -- or, more critically, *any* of the math).

Are all your claims on this list this far from reality if one pursues them? 


- Original Message - 
From: Ed Porter [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Tuesday, December 04, 2007 10:52 PM
Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]


The particular NL parser paper in question, Collins's Convolution Kernels
for Natural Language
(http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/Collins-kernels.pdf)
is actually saying something quite important that extends way beyond parsers
and is highly applicable to AGI in general.  

It is actually showing that you can do something roughly equivalent to
growing neural gas (GNG) in a space with something approaching 500,000
dimensions, but you can do it without normally having to deal with more than
a few of those dimensions at one time.  GNG is an algorithm I learned about
from reading Peter Voss that allows one to learn how to efficiently
represent a distribution in a relatively high dimensional space in a totally
unsupervised manner.  But there really seem to be no reason why there should
be any limit to the dimensionality of the space in which the Collin's
algorithm works, because it does not use an explicit vector representation,
nor, if I recollect correctly, a Euclidian distance metric, but rather a
similarity metric which is generally much more appropriate for matching in
very high dimensional spaces.

But what he is growing are not just points representing where data has
occurred in a high dimensional space, but sets of points that define
hyperplanes for defining the boundaries between classes.  My recollection is
that this system learns automatically from both labeled data (instances of
correct parse trees) and randomly generated deviations from those instances.
His particular algorithm matches tree structures, but with modification it
would seem to be extendable to matching arbitrary nets.  Other versions of
it could be made to operate, like GNG, in an unsupervised manner.

If you stop and think about what this is saying and generalize from it, it
provides an important possible component in an AGI tool kit. What it shows
is not limited to parsing, but it would seem possibly applicable to
virtually any hierarchical or networked representation, including nets of
semantic web RDF triples, and semantic nets, and predicate logic
expressions.  At first glance it appears it would even be applicable to
kinkier net matching algorithms, such as an Augmented transition network
(ATN) matching.

So if one reads this paper with a mind to not only what it specifically
shows, but to what how what it shows could be expanded, this paper says
something very important.  That is, that one can represent, learn, and
classify things in very high dimensional spaces -- such as 10^1
dimensional spaces -- and do it efficiently provided the part of the space
being represented is sufficiently sparsely connected.

I had already assumed this, before reading this paper, but the paper was
valuable to me because it provided a mathematically rigorous support for my
prior models, and helped me better understand the mathematical foundations
of my own prior intuitive thinking.  

It means that systems like Novemente can deal in very high dimensional
spaces relatively efficiently. It does not mean that all processes that can
be performed in such spaces will be computationally cheap (for example,
combinatorial searches), but it means that many of them, such as GNG like
recording of 

Re: [agi] None of you seem to be able ...

2007-12-05 Thread Mike Tintner


Ben:  Obviously the brain contains answers to many of the unsolved problems 
of

AGI (not all -- e.g. not the problem of how to create a stable goal system
under recursive self-improvement).   However, current neuroscience does
NOT contain these answers. And neither you nor anyone else has ever made 
a cogent argument that

emulating the brain is the ONLY route to creating powerful AGI.


Absolutely agree re neuroscience's lack of answers (hence Richard's 
assertion that his system is based on what cognitive science knows about 
brain architecture is not a smart one -  the truth is not much at all.)


The cogent argument for emulating the brain - in brief - is simply that it's 
the only *all-rounder* cognitive system, the only multisensory, multimedia, 
multisignsystem that can solve problems in language AND maths AND 
(arithmetic/algebra/geometry) AND diagrams AND maps AND photographs  AND 
cinema AND painting AND sculpture  3-D models AND body language etc  - 
and switch from solving problems in any one sign or sensory system to 
solving the same problems in any other sign or sensory system. And it's by 
extension the only truly multidomain system that can switch from solving 
problems in any one subject domain to any other, from solving problems of 
how to play football to how to marshall troops on a battlefield to how to do 
geometry,  applying the same knowledge across domains.  (I'm just 
formulating this argument for the first time - so it will no doubt need 
revisions!)  But  - correct me - I don't think there's any AI system that's 
a two-rounder, able to work across two domains and sign systems, let 
alone, of course all of them. (And it's taken a billion years to evolve this 
all-round system which is clearly grounded in a body)


It LOOKS relatively straightforward to emulate or suspersede this system, 
when you make the cardinal error of drawing specialist comparisons - your 
we-can-make-a-plane-that-flies-faster-than-a-bird argument (and of course we 
already have machines that can think billions of times faster than the 
brain). But inventing general, all-round systems that are continually alive, 
complex psychoeconomies managing whole sets of complex activities in the 
real, as opposed to artificial world(s) and not just isolated tasks, is a 
whole different ballgame, to inventing specialist systems.


It represents a whole new stage of machine evolution - a step as drastic as 
the evolution of life from matter - and you, sir, :), have scant respect for 
the awesomeness of the undertaking (even though, paradoxically, you're much 
more aware than most of its complexity). Respect to the brain, bro!


It's a little as if you - not, I imagine, the very finest athletic 
specimen -  were to say:  hey, I can take the heavyweight champ of the world 
... AND Federer... AND Tiger Woods... AND the champ of every other sport. 
Well, yeah, you can indeed box and play tennis and actually do every other 
sport, but there's an awful lot more to beating even one of those champs let 
alone all or a selection of them than meets the eye (even if you were in 
addition to have a machine that could throw super-powerful punches or play 
superfast backhands).


Ben/MT:  none of the unsolved

problems are going to be solved - without major creative leaps. Just look
even at the ipod  iphone -  major new technology never happens without 
such

leaps.


Ben:The above sentence is rather hilarious to me. If the Ipod and Iphone 
are your measure for creative leaps then
there have been  loads and loads of major creative leaps in AGI and 
narrow-AI research. As an example of a creative leap (that is speculative 
and may be wrong, but is certainly creative), check out my hypothesis of 
emergent social-psychological

intelligence as related to mirror neurons and octonion algebras:

http://www.goertzel.org/dynapsyc/2007/mirrorself.pdf


Ben,

Name ONE major creative leap in AGI  (in narrow AI, no question, there's 
loads).


Some background here: I am deeply interested in,  have done a lot of work, 
on the psychology  philosophy of creativity, as well as intelligence.


So your creative paper is interesting to me, because it helps refine 
definitions of creativity and creative leaps.  The ipod  iphone do indeed 
represent brilliant leaps in terms of interfaces - with the touch-wheel and 
the pinch touchscreen [as distinct from the touchscreen itself] - v. neat 
lateral ideas which worked. No, not revolutionary in terms of changing vast 
fields of technology, just v. lateral, unexpected, albeit simple ideas. I 
have seen no similarly lateral approaches in AGI.


Your paper represents almost a literal application of the idea that 
creativity is ingenious/lateral. Hey it's no trick to be just 
ingenious/lateral or fantastic. How does memory work? -  well, you see, 
there's this system of angels that ferry every idea you have and file it in 
an infinite set of multiverses...etc...  Anyone can come up with fantastic 
ideas. The 

RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Ed Porter
Dave, 

 

Thanks for the link.  Seems like it gives Matt the right to say to the world
I told you so.  

 

I wonder if OpenCog could get involved in this, or something like this, in a
productive way.

 

Ed Porter

 

-Original Message-
From: David Hart [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 3:16 AM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

 

On 12/5/07, Matt Mahoney [EMAIL PROTECTED] wrote:


[snip]  Centralized search is limited to a few big players that
can keep a copy of the Internet on their servers.  Google is certainly
useful,
but imagine if it searched a space 1000 times larger and if posts were 
instantly added to its index, without having to wait days for its spider to
find them.  Imagine your post going to persistent queries posted days
earlier.
Imagine your queries being answered by real human beings in addition to
other 
peers.

I probably won't be the one writing this program, but where there is a need,
I
expect it will happen.



Wikia, the company run by Wikipedia founder Jimmy Wales, is tackling the
Internet-scale distributed search problem -
http://search.wikia.com/wiki/Atlas

Connecting to related threads (some recent, some not-so-recent), the Grub
distributed crawler ( http://search.wikia.com/wiki/Grub ) is intended to be
one of many plug-in Atlas Factories. A development goal for Grub is to
enhance it with a NL toolkit (e.g. the soon-to-be-released RelEx), so it can
do more than parse simple keywords and calculate statistical word
relationships. 

-dave

 

 

  _  

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?
http://v2.listbox.com/member/?;


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72270417-205c60

Re: [agi] None of you seem to be able ...

2007-12-05 Thread Richard Loosemore

Ed Porter wrote:

RICHARD LOOSEMOORE There is a high prima facie *risk* that intelligence
involves a 
significant amount of irreducibility (some of the most crucial 
characteristics of a complete intelligence would, in any other system, 
cause the behavior to show a global-local disconnect),



ED PORTER= Richard, prima facie means obvious on its face.  The above
statement and those that followed it below may be obvious to you, but it is
not obvious to a lot of us, and at least I have not seen (perhaps because of
my own ignorance, but perhaps not) any evidence that it is obvious.
Apparently Ben also does not find your position to be obvious, and Ben is no
dummy.

Richard, did you ever just consider that it might be turtles all the way
down, and by that I mean experiential patterns, such as those that could be
represented by Novamente atoms (nodes and links) in a gen/comp hierarchy
all the way down.  In such a system each level is quite naturally derived
from levels below it by learning from experience.  There is a lot of dynamic
activity, but much of it is quite orderly, like that in Hecht-Neilsen's
Confabulation.  There is no reason why there has to be a GLOBAL-LOCAL
DISCONNECT of the type you envision, i.e., one that is totally impossible
to architect in terms of until one totally explores global-local disconnect
space (just think how large an exploration space that might be).

So if you have prima facie evidence to support your claim (other than your
paper which I read which does not meet that standard


Ed,

Could you please summarize for me what your understandig is of my claim 
for the prima facie evidence (that I gave in that paper), and then, if 
you would, please explain where you believe the claim goes wrong.


With that level of specificity, we can discuss it.

Many thanks,



Richard Loosemore



), then present it.  If

you make me eat my words you will have taught me something sufficiently
valuable that I will relish the experience.



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72269726-d5af19


Re: [agi] None of you seem to be able ...

2007-12-05 Thread Richard Loosemore

Mike Tintner wrote:

Richard:  science does too know a good deal about brain
architecture!I *know* cognitive science.  Cognitive science is a friend 
of mine.

Mike, you are no cognitive scientist :-).

Thanks, Richard,  for keeping it friendly - but -   are you saying cog 
sci knows the:


*'engram' - how info is encoded
*any precise cognitive form or level of the hierarchical processing 
vaguely defined by Hawkins et al

*how ideas are compared at any level -
*how analogies are produced
*whether templates or similar are/are not used in visual object processing

etc. etc ???


Well, you are crossing over between levels here in a way that confuses me.

Did you mean brain architecture when you said brain architecture? 
that is, are you taking about brain-level stuff, or cognitive-level 
stuff?  I took you to be talking quite literally about the neural level.


More generally, though, we understand a lot, but of course the picture 
is extremely incomplete.  But even though the picture is incomplete that 
would not mean that cognitive science knows almost nothing.


My position is that cog sci has a *huge* amount of information stashed 
away, but it is in a format that makes it very hard for someone trying 
to build an intelligent system to actually use.  AI people make very 
little use of this information at all.


My goal is to deconstruct cog sci in such a way as to make it usable in 
AI.  That is what I am doing now.



Obviously, if science can't answer the engram question, it can hardly 
answer anything else.


You are indeed a cognitive scientist but you don't seem to have a very 
good overall scientific/philosophical perspective on what that entails - 
and the status of cog. sci. is a fascinating one, philosophically. You 
see, I utterly believe in the cog. sci. approach of applying 
computational models to the brain and human thinking.  But what that has 
produced is *not* hard knowledge. It has made us aware of the 
complexities of what is probably involved, got us to the point where we 
are, so to speak, v. warm / close to the truth. But no, as, I think 
Ben asserted, what we actually *know* for sure about the brain's 
information processing is v. v. little.  (Just look at our previous 
dispute, where clearly there is no definite knowledge at all about how 
much parallel computation is involved in the brain's processing of any 
idea [like a sentence]). Those cog. sci, models are more like analogies 
than true theoretical models. And anyway most of the time though by no 
means all, cognitive scientists are like you  Minsky - much more 
interested in the AI applications of their models than in their literal 
scientific truth.


If you disagree, point to the hard knowledge re items like those listed 
above,  which surely must be the basis of any AI system that can 
legitimately claim to be based on the brain's architecture.


Well, it is difficult to know where to start.  What about the word 
priming results?  There is an enormous corpus of data concerning the 
time course of activation of words as a result of seeing/hearing other 
words.  I can use some of that data to constrain my models of activation.


Then there are studies of speech errors that show what kinds of events 
occur during attempts to articulate sentences:  that data can be used to 
say a great deal about the processes involved in going from an intention 
to articulation.


On and on the list goes:  I could spend all day just writing down 
examples of cognitive data and how it relates to models of intelligence.


Did you know, for example, that certain kinds of brain damage can leave 
a person with the ability to name a visually presented object, but then 
be unable to pick the object up and move it through space in a way that 
is consistent with the object's normal use . and that another type 
of brain damage can result in a person have exactly the opposite 
problem:  they can look at an object and say I have no idea what that 
is, and yet when you ask them to pick the thing up and do what they 
would typically do with the object, they pick it up and show every sign 
that they know exactly what it is for (e.g. object is a key:  they say 
they don't know what it is, but then they pick it up and put it straight 
into a nearby lock).


Now, interpreting that result is not easy, but it does seem to tell us 
that there are two almost independent systems in the brain that handle 
vision-for-identification and vision-for-action.  Why?  I don't know, 
but I have some ideas, and those ideas are helping to constrain my 
framework.




Another example of where you are not so hot on the *philosophy* of cog. 
sci. is our v. first dispute.  I claimed and claim that it is 
fundamental to cog sci to treat the brain/mind as rational. And I'm 
right - and produced and can continue endlessly producing evidence. (It 
is fundamental to all the social sciences to treat humans as rational 
decisionmaking agents). Oh no it doesn't, you said, in 

Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Richard Loosemore

Ed Porter wrote:

Mark,


MARK WASER=== You claim that It is actually showing that you can do

something roughly equivalent to growing neural gas (GNG) in a space with
something approaching 500,000 dimensions, but you can do it without normally
having to deal with more than a few of those dimensions at one time.
Collins makes no claims that even remotely resembles this.  He *is* taking a
deconstructionist approach (which Richard and many others would argue
vehemently with) -- but that is virtually the entirety of the overlap
between his paper and your claims.  Where do you get all this crap about
500,000 dimensions, for example?

ED PORTER= The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.  The very
large size of the number of dimensions was mentioned repeatedly by both
Collin's and at least one other professor with whom I talked after the
lecture.  One of the points both emphasized was that by use of the kernel
trick he was effectively matching in a 500K dimensional space, without
having to deal with most of those dimensions at any one time (although, it
is my understanding, that over many parses the system would deal with a
large percent of all those dimensions.)  


It sounds like you may have misunderstood the relevance of the high 
number of dimensions.


Correct me if I am wrong, but Collins is not really matching in large 
numbers of dimensions, he is using the kernel trick to transform a 
nonlinear CLASSIFICATION problem into a high-dimensional linear 
classification.


This is just a trick to enable a better type of supervised learning.

Would you follow me if I said that using supervised learning is of no 
use in general?  Because it means that someone has already (a) decided 
on the dimensions of representation in the initial problem domain, and 
(b) already done all the work of classifying the sentences into 
syntactically correct and syntactically incorrect.  All that the SVM 
is doing is summarizing this training data in a nice compact form:  the 
high number of dimensions involved at one stage of the problem appear to 
be just an artifact of the method, it means nothing in general.


It especially does not mean that this supervised training algorithm is 
somehow able to break out and become and unsupervised, feature-discovery 
method, which it would have to do to be of any general interest.


I still have not read Collins' paper:  I am just getting this from my 
understanding of the math you have mentioned here.


It seems that whether or not he mentioned 500K dimensions or an infinite 
number of dimensions (which he could have done) makes no difference to 
anything.


If you think it does make a big difference, could you explain why?




Richard Loosemore





If you read papers on support vector machines using kernel methods you will
realize that it is well know that you can do certain types of matching and
other operations in high dimensional spaces with out having to actually
normally deal in the high dimensions by use of the kernel trick.  The
issue is often that of finding a particular kernel that works well for your
problem.  Collins shows the kernel trick can be extended to parse tree net
matching.  


With regard to my statement that the efficiency of the kernel trick could be
applied relatively generally, it is quite well supported by the following
text from page 4 of the paper.

This paper and previous work by Lodhi et al. [12] examining the application
of convolution kernels to strings provide some evidence that convolution
kernels may provide an extremely useful tool for applying modern machine
learning techniques to highly structured objects. The key idea here is that
one may take a structured object and split it up into parts. If one can
construct kernels over the parts then one can combine these into a kernel
over the whole object. Clearly, this idea can be extended recursively so
that one only needs to construct kernels over the atomic parts of a
structured object. The recursive combination of the kernels over parts of an
object retains information regarding the structure of that object


MARK WASER=== You also make statements that are explicitly contradicted in

the paper.  For example, you say But there really seem to be no reason why
there should be any limit to the dimensionality of the space in which the
Collin's algorithm works, because it does not use an explicit vector
representation while his paper quite clearly states Each tree is
represented by an n dimensional vector where the i'th component counts the
number of occurences of the i'th tree fragment. (A mistake I believe you
made because you didn't understand the prevceding sentence -- or, more
critically, *any* of the math).

ED PORTER= The quote you give is from the last paragraph on page 

Re: [agi] None of you seem to be able ...

2007-12-05 Thread Mike Tintner

Richard:  science does too know a good deal about brain
architecture!I *know* cognitive science.  Cognitive science is a friend of 
mine.

Mike, you are no cognitive scientist :-).

Thanks, Richard,  for keeping it friendly - but -   are you saying cog sci 
knows the:


*'engram' - how info is encoded
*any precise cognitive form or level of the hierarchical processing vaguely 
defined by Hawkins et al

*how ideas are compared at any level -
*how analogies are produced
*whether templates or similar are/are not used in visual object processing

etc. etc ???

Obviously, if science can't answer the engram question, it can hardly answer 
anything else.


You are indeed a cognitive scientist but you don't seem to have a very good 
overall scientific/philosophical perspective on what that entails - and the 
status of cog. sci. is a fascinating one, philosophically. You see, I 
utterly believe in the cog. sci. approach of applying computational models 
to the brain and human thinking.  But what that has produced is *not* hard 
knowledge. It has made us aware of the complexities of what is probably 
involved, got us to the point where we are, so to speak, v. warm / close 
to the truth. But no, as, I think Ben asserted, what we actually *know* for 
sure about the brain's information processing is v. v. little.  (Just look 
at our previous dispute, where clearly there is no definite knowledge at all 
about how much parallel computation is involved in the brain's processing of 
any idea [like a sentence]). Those cog. sci, models are more like analogies 
than true theoretical models. And anyway most of the time though by no means 
all, cognitive scientists are like you  Minsky - much more interested in 
the AI applications of their models than in their literal scientific truth.


If you disagree, point to the hard knowledge re items like those listed 
above,  which surely must be the basis of any AI system that can 
legitimately claim to be based on the brain's architecture.


Another example of where you are not so hot on the *philosophy* of cog. sci. 
is our v. first dispute.  I claimed and claim that it is fundamental to cog 
sci to treat the brain/mind as rational. And I'm right - and produced and 
can continue endlessly producing evidence. (It is fundamental to all the 
social sciences to treat humans as rational decisionmaking agents). Oh no it 
doesn't, you said, in effect - sci psychology is obsessed with the 
irrationalities of the human mind. And that is true, too. If you hadn't gone 
off in high dudgeon, we could have resolved the apparent contradiction. Sci 
psych does indeed love to study and point out all kinds of illusions and 
mistakes of the human mind. But to cog. sci. these are all so many *bugs* in 
an otherwise rational system. The system as a whole is still rational, as 
far as cog sci is concerned, but some of its parts - its heuristics, 
attitudes etc - are not. They, however, can be fixed.


So what I have been personally asserting elsewhere - namely that the brain 
is fundamentally irrational or crazy - that the human mind can't follow a 
logical, joined up train of reflective thought for more than a relatively 
few seconds on end - and is positively designed to be like that, and can't 
and isn't meant to be fixed  - does indeed represent a fundamental challenge 
to cog. sci's current rational paradigm of mind. (The flip side of that 
craziness is that it is a fundamentally *creative* mind -  this is utterly 
central to AGI)




-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72344338-9fc6ac


Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Mark Waser

ED PORTER= The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.

OK.  I'll bite.  So what do *you* believe that these dimensions are?  Words? 
Word pairs?  Entire sentences?  Different trees? 



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72410952-199e0d


RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Ed Porter
Richard,

It actually is more valuable than you say.  

First, the same kernel trick can be used for GNG type unsupervised learning
in high dimensional spaces.  So it is not limited to supervised learning.

Second, you are correct is saying that through the kernel trick it is doing
the actually doing almost all of its computations in a lower dimensional
space.  

But unlike with many kernel tricks, in this one the system actually directly
access each of the dimensions in the space in different combinations as
necessary.  That is important.  It means that you can have a space with as
many dimensions as there are features or patterns in your system and still
efficiently do similarity matching (but not distance matching.)

Ed Porter

-Original Message-
From: Richard Loosemore [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 2:37 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

Ed Porter wrote:
 Mark,
 
 MARK WASER=== You claim that It is actually showing that you can do
 something roughly equivalent to growing neural gas (GNG) in a space with
 something approaching 500,000 dimensions, but you can do it without
normally
 having to deal with more than a few of those dimensions at one time.
 Collins makes no claims that even remotely resembles this.  He *is* taking
a
 deconstructionist approach (which Richard and many others would argue
 vehemently with) -- but that is virtually the entirety of the overlap
 between his paper and your claims.  Where do you get all this crap about
 500,000 dimensions, for example?
 
 ED PORTER= The 500K dimensions were mentioned several times in a
 lecture Collins gave at MIT about his parse.  This was probably 5 years
ago
 so I am not 100% sure the number was 500K, but I am about 90% sure that
was
 the number used, and 100% sure the number was well over 100K.  The very
 large size of the number of dimensions was mentioned repeatedly by both
 Collin's and at least one other professor with whom I talked after the
 lecture.  One of the points both emphasized was that by use of the kernel
 trick he was effectively matching in a 500K dimensional space, without
 having to deal with most of those dimensions at any one time (although, it
 is my understanding, that over many parses the system would deal with a
 large percent of all those dimensions.)  

It sounds like you may have misunderstood the relevance of the high 
number of dimensions.

Correct me if I am wrong, but Collins is not really matching in large 
numbers of dimensions, he is using the kernel trick to transform a 
nonlinear CLASSIFICATION problem into a high-dimensional linear 
classification.

This is just a trick to enable a better type of supervised learning.

Would you follow me if I said that using supervised learning is of no 
use in general?  Because it means that someone has already (a) decided 
on the dimensions of representation in the initial problem domain, and 
(b) already done all the work of classifying the sentences into 
syntactically correct and syntactically incorrect.  All that the SVM 
is doing is summarizing this training data in a nice compact form:  the 
high number of dimensions involved at one stage of the problem appear to 
be just an artifact of the method, it means nothing in general.

It especially does not mean that this supervised training algorithm is 
somehow able to break out and become and unsupervised, feature-discovery 
method, which it would have to do to be of any general interest.

I still have not read Collins' paper:  I am just getting this from my 
understanding of the math you have mentioned here.

It seems that whether or not he mentioned 500K dimensions or an infinite 
number of dimensions (which he could have done) makes no difference to 
anything.

If you think it does make a big difference, could you explain why?




Richard Loosemore




 If you read papers on support vector machines using kernel methods you
will
 realize that it is well know that you can do certain types of matching and
 other operations in high dimensional spaces with out having to actually
 normally deal in the high dimensions by use of the kernel trick.  The
 issue is often that of finding a particular kernel that works well for
your
 problem.  Collins shows the kernel trick can be extended to parse tree net
 matching.  
 
 With regard to my statement that the efficiency of the kernel trick could
be
 applied relatively generally, it is quite well supported by the following
 text from page 4 of the paper.
 
 This paper and previous work by Lodhi et al. [12] examining the
application
 of convolution kernels to strings provide some evidence that convolution
 kernels may provide an extremely useful tool for applying modern machine
 learning techniques to highly structured objects. The key idea here is
that
 one may take a structured object and split it up into parts. If one can
 construct kernels over the parts then one can 

RE: [agi] None of you seem to be able ...

2007-12-05 Thread Ed Porter
Richard, 

I quickly reviewed your paper, and you will be happy to note that I
had underlined and highlighted it so such skimming was more valuable that it
otherwise would have been.

With regard to COMPUTATIONAL IRREDUCIBILITY, I guess a lot depends
on definition. 

Yes, my vision of a human AGI would be a very complex machine.  Yes,
a lot of its outputs could only be made with human level reasonableness
after a very large amount of computation.  I know of no shortcuts around the
need to do such complex computation.  So it arguably falls in to what you
say Wolfram calls computational irreducibility.  

But the same could be said for any of many types of computations,
such as large matrix equations or Google's map-reduces, which are routinely
performed on supercomputers.

So if that is how you define irreducibility, its not that big a
deal.  It just means you have to do a lot of computing to get an answer,
which I have assumed all along for AGI (Remember I am the one pushing for
breaking the small hardware mindset.)  But it doesn't mean we don't know how
to do such computing or that we have to do a lot more complexity research,
of the type suggested in your paper, before we can successfully designing
AGIs.

With regard to GLOBAL-LOCAL DISCONNECT, again it depends what you
mean.  

You define it as

The GLD merely signifies that it might be difficult or
impossible to derive analytic explanations of global regularities that we
observe in the system, given only a knowledge of the local rules that drive
the system. 

I don't know what this means.  Even the game of Life referred to in
your paper can be analytically explained.  It is just that some of the
things that happen are rather complex and would take a lot of computing to
analyze.  So does the global-local disconnect apply to anything where an
explanation requires a lot of analysis?  If that is the case than any large
computation, of the type which mankind does and designs every day, would
have a global-local disconnect.

If that is the case, the global-local disconnect is no big deal.  We
deal with it every day.

I don't know exactly what you mean by regularities in the above
definition, but I think you mean something equivalent to patterns or
meaningful generalizations.  In many types of computing commonly done, you
don't know what the regularizes will be without tremendous computing.  For
example in principal component analysis, you often don't know what the major
dimensions of a distribution will be until you do a tremendous amount of
computation.  Does that mean there is a GLD in that problem?  If so, it
doesn't seem to be a big deal.  PCA is done all the time, as are all sorts
of other complex matrix computations.

But you have implied multiple times that you think the global-local
disconnect is a big, big deal.  You have implied multiple times it presents
a major problem to developing AGI.  If I interpret your prior statements
taken in conjunction with your paper correctly, I am guessing your major
thrust is that it will be very difficult to design AGI's where the desired
behavior is to be the result of many casual relations between a vast number
of active elements, because in such system the causality is so non-linear
and complex that we cannot currently properly think and design in terms of
them.  

Although this proposition is not obviously true on its face, it is
arguably also not obviously false on its face.

Although it is easy to design system where the systems behavior
would be sufficiently chaotic that such design would be impossible, it seems
likely that it is also possible to design complex system in which the
behavior is not so chaotic or unpredictable.  Take the internet.  Something
like 10^8 computers talk to each other, and in general it works as designed.
Take IBM's supercomputer BlueGene L, 64K dual core processor computer each
with at least 256MBytes all capable of receiving and passing messages at
4Ghz on each of over 3 dimensions, and capable of performing 100's of
trillions of FLOP/sec.  Such a system probably contains at least 10^14
non-linear separately functional elements, and yet it works as designed.  If
there is a global-local disconnect in the BlueGene L, which there could be
depending on your definition, it is not a problem for most of the
computation it does.

So why are we to believe, as your paper seems to suggest, that we
have to do some scan of complexity space before we can design AGI systems?

In the AGI I am thinking of one would be able to predict many of the
behaviors of the machine, at least at a general level from local rules,
because the system has been designed to produce certain types of results in
certain types of situations.  Of course, because the system is large the
inferencing from each of the many local rules would require a hell of a lot
of computing, so much 

RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Ed Porter
Mark,

The paper said:

Conceptually we begin by enumerating all tree fragments that occur in the
training data 1,...,n.

Those are the dimensions, all of the parse tree fragments in the training
data.  And as I pointed out in an email I just sent to Richard, although
usually only a small set of them are involved in any one match between two
parse trees, they can all be used over set of many such matches.

So the full dimensionality is actually there, it is just that only a
particular subset of them are being used at any one time.  And when the
system is waiting for the next tree to match it is potentially capability of
matching it against any of its dimensions.

Ed Porter

-Original Message-
From: Mark Waser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 3:07 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

ED PORTER= The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.

OK.  I'll bite.  So what do *you* believe that these dimensions are?  Words?

Word pairs?  Entire sentences?  Different trees? 


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72646193-0bde77attachment: winmail.dat

Re: [agi] How to tepresent things problem

2007-12-05 Thread Vladimir Nesov
On Dec 5, 2007 7:13 PM, Richard Loosemore [EMAIL PROTECTED] wrote:

 Vladimir Nesov wrote:
  Richard,
 
  I'll try to summarize my solutions to these problems which allow to
  use a network without need for explicit copying of instances (or any
  other kind of explicit allocation of entities which are to correspond
  to instances). (Although my model also requires ubiquitous induction
  between nodes which disregards network structure.)
 
  Basic structure of network: network is 'spiking' in the sense that it
  operates in real time and links between nodes have a delay. Input
  nodes send in the network sensory data, output nodes read actions. All
  links between nodes can shift over time and experience through
  induction. Initial configuration specifies simple pathways from input
  to output, shifting of links changes these pathways, making them more
  intricate to reflect experience.
 
  Scene (as a graph which describes objects) is represented by active
  nodes: node being active corresponds to feature being included in the
  scene. Not all features present in the scene are active at the same
  time, some of them can activate periodically, every several tacts or
  more, and some other features can be represented by summarizing
  simplified features (node 'apple' instead of 3D sketch of its
  surface).
 
  Network edges (links) activate the nodes. If condition (configuration
  of nodes from which link originates) for a link is satisfied, and link
  is active, it activates the target node.
 
  Activation in the network follows a variation of Hebbian rule,
  'induction rule' (which is essential for mechanism of instance
  representation): link becomes active (starts to activate its target
  node) only if it observed that node to be activated after condition
  for link was satisfied in a majority of cases (like 90% or more). So,
  if some node is activated in a network, there are good reasons for
  that, no blind association-seeking.
 
  Representation of instances. If scene contains multiple instances of
  the same object (or pattern, say an apple), and these patterns are not
  modified in it, there is no point in representing those instances
  separately: all places at which instances are located ('instantiation
  points', say places where apples lie or hang) refer to the same
  pattern. The only problem is modification of instances in specific
  instantiation points.
 
  This scene can be implemented by creating links from instantiation
  points to nodes that represent the pattern. As a result, during
  activation cycle of represented scene, activation of instantiation
  points leads to activation of patterns (as there's only one pattern
  for each instantiation point, so induction rule works in this
  direction), but not in other direction (as there are many
  instantiation points for the pattern, none of them will be a target of
  a link originating from the pattern).
 
  This one-way activation results in a propagation of 'activation waves'
  from instantiation points to the pattern, so that each wave 'outlines'
  both pattern and instantiation point. These waves effectively
  represent instances. If there's a modifier associated with specific
  instantiation point, during an activation wave it will activate during
  the same wave as pattern does, and as a result it can be applied to
  it. As other instantiation points refer to the pattern 'by value',
  pattern at those points won't change much.
 
  Also, this way of representing instances is central to extraction of
  similarities: if several objects are similar, they will share some of
  their nodes and as a result their structures will influence one
  another, creating a pressure to extract a common pattern.

 I have questions at this point.

 Your notion of instantiation point sounds like what I would call an
 instance node which is created on the fly.

No, it's not that; I'll try to clarify using more detailed example.
Say, there are these apples (to which I referred to as 'pattern'),
which are all represented by single clump of nodes, the same as would
be used for a single apple. Instantiation points are actual objects
that in some sense 'hold' the apples on the scene, for example a
particular plate on which an apple lies. In the scene, there is (for
simplicity) only one plate, and there's always an apple that lies on
it. So, we can create a PLATE-APPLE link, and this link satisfies the
induction rule since whenever PLATE is encountered, there's an APPLE
on it. Here, PLATE is an instantiation point, and APPLE is a pattern.
If scene also contains an apple-tree BRANCH, on which there's also an
APPLE hanging, we can create a BRANCH-APPLE link. But we can't create
an APPLE-PLATE link, since in one of instantiation points (BRANCH),
PLATE is not there when APPLE is. Also, these links are short-term
things (as plates don't always have apples on them), but scene can be
stored long-term if they are duplicated on new nodes corresponding to
these nodes 

Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Mark Waser
Dimensions is an awfully odd word for that since dimensions are normally 
assumed to be orthogonal.


- Original Message - 
From: Ed Porter [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Wednesday, December 05, 2007 5:08 PM
Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]


Mark,

The paper said:

Conceptually we begin by enumerating all tree fragments that occur in the
training data 1,...,n.

Those are the dimensions, all of the parse tree fragments in the training
data.  And as I pointed out in an email I just sent to Richard, although
usually only a small set of them are involved in any one match between two
parse trees, they can all be used over set of many such matches.

So the full dimensionality is actually there, it is just that only a
particular subset of them are being used at any one time.  And when the
system is waiting for the next tree to match it is potentially capability of
matching it against any of its dimensions.

Ed Porter

-Original Message-
From: Mark Waser [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 05, 2007 3:07 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

ED PORTER= The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.

OK.  I'll bite.  So what do *you* believe that these dimensions are?  Words?

Word pairs?  Entire sentences?  Different trees?


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72664919-0f4727


Re: [agi] None of you seem to be able ...

2007-12-05 Thread Mike Tintner

Richard: Now, interpreting that result is not easy,

Richard, I get the feeling you're getting understandably tired with all your 
correspondence today. Interpreting *any* of the examples of *hard* cog sci 
that you give is not easy. They're all useful, stimulating stuff, but they 
don't add up to a hard pic. of the brain's cognitive architecture. Perhaps 
Ben will back me up on this - it's a rather important point - our overall 
*integrated* picture of the brain's cognitive functioning is really v. poor, 
although certainly we have a wealth of details about, say, which part of the 
brain is somehow connected to a given operation.


Richard:I admit that I am confused right
now:  in the above paragraphs you say that your position is that the
human mind is 'rational' and then later that it is 'irrational' - was
the first one of those a typo?

Richard, No typo whatsoever if you just reread. V. clear. I say and said: 
*scientific pychology* and *cog sci* treat the mind as rational. I am the 
weirdo who is saying this is nonsense - the mind is 
irrational/crazy/creative - rationality is a major *achievement* not 
something that comes naturally. Mike Tintner= crazy/irrational- somehow, I 
don't think you'll find that hard to remember. 



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72407413-5af67f


Re: [agi] How to tepresent things problem

2007-12-05 Thread Vladimir Nesov
Richard,

I'll try to summarize my solutions to these problems which allow to
use a network without need for explicit copying of instances (or any
other kind of explicit allocation of entities which are to correspond
to instances). (Although my model also requires ubiquitous induction
between nodes which disregards network structure.)

Basic structure of network: network is 'spiking' in the sense that it
operates in real time and links between nodes have a delay. Input
nodes send in the network sensory data, output nodes read actions. All
links between nodes can shift over time and experience through
induction. Initial configuration specifies simple pathways from input
to output, shifting of links changes these pathways, making them more
intricate to reflect experience.

Scene (as a graph which describes objects) is represented by active
nodes: node being active corresponds to feature being included in the
scene. Not all features present in the scene are active at the same
time, some of them can activate periodically, every several tacts or
more, and some other features can be represented by summarizing
simplified features (node 'apple' instead of 3D sketch of its
surface).

Network edges (links) activate the nodes. If condition (configuration
of nodes from which link originates) for a link is satisfied, and link
is active, it activates the target node.

Activation in the network follows a variation of Hebbian rule,
'induction rule' (which is essential for mechanism of instance
representation): link becomes active (starts to activate its target
node) only if it observed that node to be activated after condition
for link was satisfied in a majority of cases (like 90% or more). So,
if some node is activated in a network, there are good reasons for
that, no blind association-seeking.

Representation of instances. If scene contains multiple instances of
the same object (or pattern, say an apple), and these patterns are not
modified in it, there is no point in representing those instances
separately: all places at which instances are located ('instantiation
points', say places where apples lie or hang) refer to the same
pattern. The only problem is modification of instances in specific
instantiation points.

This scene can be implemented by creating links from instantiation
points to nodes that represent the pattern. As a result, during
activation cycle of represented scene, activation of instantiation
points leads to activation of patterns (as there's only one pattern
for each instantiation point, so induction rule works in this
direction), but not in other direction (as there are many
instantiation points for the pattern, none of them will be a target of
a link originating from the pattern).

This one-way activation results in a propagation of 'activation waves'
from instantiation points to the pattern, so that each wave 'outlines'
both pattern and instantiation point. These waves effectively
represent instances. If there's a modifier associated with specific
instantiation point, during an activation wave it will activate during
the same wave as pattern does, and as a result it can be applied to
it. As other instantiation points refer to the pattern 'by value',
pattern at those points won't change much.

Also, this way of representing instances is central to extraction of
similarities: if several objects are similar, they will share some of
their nodes and as a result their structures will influence one
another, creating a pressure to extract a common pattern.

Creation of new nodes. Each new node during a creation phase
corresponds to an existing node ('original node') in the network.
During this phase (which isn't long), each activated link that
connects to original node (both incoming and outgoing connections) is
copied so that in a copy original node is substituted by a new node.
As a result, new node will be active in situations in this original
node activated during creation of the new node. New node can represent
episodic memory or more specific subcategory of category represented
by original node. Initially, new node doesn't influence behavior of
the system (as it's activated in a subset of tacts in which original
node can activate), but because of this difference it can obtain
inductive links different from those that fit original node.



On Dec 5, 2007 4:47 AM, Richard Loosemore [EMAIL PROTECTED] wrote:
 Dennis Gorelik wrote:
  Richard,
 
  3) A way to represent things - and in particular, uncertainty - without
  getting buried up to the eyeballs in (e.g.) temporal logics that nobody
  believes in.
 
  Conceptually the way of representing things is described very well.
  It's Neural Network -- set of nodes (concepts), when every node can be
  connected with the set of other nodes. Every connection has it's own
  weight.
 
  Some nodes are connected with external devices.
  For example, one node can be connected with one word in text
  dictionary (that is an external device).
 
 
  Do you see any 

Re: [agi] Flexibility of AI vs. a PC

2007-12-05 Thread Richard Loosemore

William Pearson wrote:

One thing that has been puzzling me for a while is, why some people
expect an intelligence to be less flexible than a PC.

What do I mean by this? A PC can have any learning algorithm, bias or
representation of data we care to create. This raises another
question: how are we creating a representation if not copying it from
some sense from our brains? So why do we still create systems that
have fixed representations of the external world, fixed methods of
learning?

Take  the development of echo location in blind people, or the ability
to take visual information from stimulating the tongue. Isn't this
sufficient evidence to suggest we should be trying to make our AIs as
flexible as the most flexible things we know?


Well said.



Richard Loosemore

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72270075-4c3b39


Re: [agi] None of you seem to be able ...

2007-12-05 Thread Richard Loosemore

Mike Tintner wrote:


Ben:  Obviously the brain contains answers to many of the unsolved 
problems of
AGI (not all -- e.g. not the problem of how to create a stable goal 
system

under recursive self-improvement).   However, current neuroscience does
NOT contain these answers. And neither you nor anyone else has ever 
made a cogent argument that

emulating the brain is the ONLY route to creating powerful AGI.


Absolutely agree re neuroscience's lack of answers (hence Richard's 
assertion that his system is based on what cognitive science knows about 
brain architecture is not a smart one -  the truth is not much at all.)


Um, excuse me?

Let me just make sure I understand this:  you say that it is not smart 
of me to say that my system is based on what cognitive science knows 
about brain architecture, because cognitive science knows not much at 
all about brain architecture?


Number one:  I don't actually say that (brain architecture is only a 
small part of what is involved in my system).


Number two:  Cognitive science does too know a good deal about brain 
architecture!


I *know* cognitive science.  Cognitive science is a friend of mine. 
Mike, you are no cognitive scientist :-).



Richard Loosemore

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72293683-687e21


Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Mark Waser

HeavySarcasmWow.  Is that what dot products are?/HeavySarcasm

You're confusing all sorts of related concepts with a really garbled 
vocabulary.


Let's do this with some concrete 10-D geometry . . . . Vector A runs from 
(0,0,0,0,0,0,0,0,0,0) to (1, 1, 0,0,0,0,0,0,0,0).  Vector B runs from 
(0,0,0) to (1, 0, 1,0,0,0,0,0,0,0).


Clearly A and B share the first dimension.  Do you believe that they share 
the second and the third dimension?  Do you believe that dropping out the 
fourth through tenth dimension in all calculations is some sort of huge 
conceptual breakthrough?


The two vectors are similar in the first dimension (indeed, in all but the 
second and third) but otherwise very distant from each other (i.e. they are 
*NOT* similar).  Do you believe that these vectors are similar or distant?


THE ALLEGATION BELOW THAT I MISUNDERSTOOD THE MATH BECAUSE THOUGHT 
COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR HAVING THE FULL 
DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY FALSE.


My allegation was that you misunderstood the math because you claimed that 
Collin's paper does not use an explicit vector representation while 
Collin's statements and the math itself makes it quite clear that they are 
dealing with a vector representation scheme.  I'm now guessing that you're 
claiming that you intended explicit to mean full dimensionality. 
Whatever.  Don't invent your own meanings for words and you'll be 
misunderstood less often (unless you continue to drop out key words like in 
the capitalized sentence above).



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72452073-36665f


RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Ed Porter
Mark, 

Your last email started OK.  I'll bite. 

I guess you didn't bite for very long.  We are already back to explicitly
marked HeavySarcasm mode.

I guess one could argue, as you seem to be doing, that indicating which of
500k dimensions had a match between two subtrees currently being compared,
could be considered equivalent to explicitly representing a huge 500k
dimensional binary vector -- but i think one could more strongly claim that
such an indication would be, at best, only an implicit representation of the
500k vector.  

THE KEY POINT I WAS TRYING TO GET ACROSS WAS ABOUT NOT HAVING TO EXPLICITLY
DEAL WITH 500K TUPLES in each match, which is what I meant when I said not
explicitly deal with the high dimensional vectors.  This is a big plus in
terms of representational and computational efficiency.  I did not say there
was nothing equivalent to an implicit use of the high dimensional vector,
because kernels implicitly do use high dimensional vectors, but they do so
implicitly rather than explicitly.  That is why they increase efficiency.

My Merriam-Webster's Collegiate Dictionary gives as its first, which usually
means most common, definition of  explicit the following:

 fully revealed or expressed without vagueness,
implication, or ambiguity.

The information that two subtree to be matched contains a given set of
subtrees, defined by their indicies, without more, does not by itself define
a full 500K vector, nor even the full dimensionality of the vector.  That
information can only be derived from other information, which presumably is
not even used in the match procedure

Of course there are other definitions of the world explicit which mean
exact, and you could argue that indicating a few of the 500K indicies is
equivalent to exactly specifying a corresponding 500K dimensional vector,
once one takes into account other information.

When a use of a word in a given statement has two interpretations one of
which is correct, it is not clear one has the right to attack the person
making that statement for being incorrect.  At most you can attack him for
being ambiguous.  And normally on this list people do not attack other
people as rudely as you have attached me for merely being ambiguous.

Ed Porter


-Original Message-
From: Mark Waser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 3:40 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

HeavySarcasmWow.  Is that what dot products are?/HeavySarcasm

You're confusing all sorts of related concepts with a really garbled 
vocabulary.

Let's do this with some concrete 10-D geometry . . . . Vector A runs from 
(0,0,0,0,0,0,0,0,0,0) to (1, 1, 0,0,0,0,0,0,0,0).  Vector B runs from 
(0,0,0) to (1, 0, 1,0,0,0,0,0,0,0).

Clearly A and B share the first dimension.  Do you believe that they share 
the second and the third dimension?  Do you believe that dropping out the 
fourth through tenth dimension in all calculations is some sort of huge 
conceptual breakthrough?

The two vectors are similar in the first dimension (indeed, in all but the 
second and third) but otherwise very distant from each other (i.e. they are 
*NOT* similar).  Do you believe that these vectors are similar or distant?

 THE ALLEGATION BELOW THAT I MISUNDERSTOOD THE MATH BECAUSE THOUGHT 
 COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR HAVING THE FULL 
 DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY FALSE.

My allegation was that you misunderstood the math because you claimed that 
Collin's paper does not use an explicit vector representation while 
Collin's statements and the math itself makes it quite clear that they are 
dealing with a vector representation scheme.  I'm now guessing that you're 
claiming that you intended explicit to mean full dimensionality. 
Whatever.  Don't invent your own meanings for words and you'll be 
misunderstood less often (unless you continue to drop out key words like in 
the capitalized sentence above).


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72881028-794447attachment: winmail.dat

Re: [agi] How to tepresent things problem

2007-12-05 Thread Richard Loosemore

Vladimir Nesov wrote:

Richard,

I'll try to summarize my solutions to these problems which allow to
use a network without need for explicit copying of instances (or any
other kind of explicit allocation of entities which are to correspond
to instances). (Although my model also requires ubiquitous induction
between nodes which disregards network structure.)

Basic structure of network: network is 'spiking' in the sense that it
operates in real time and links between nodes have a delay. Input
nodes send in the network sensory data, output nodes read actions. All
links between nodes can shift over time and experience through
induction. Initial configuration specifies simple pathways from input
to output, shifting of links changes these pathways, making them more
intricate to reflect experience.

Scene (as a graph which describes objects) is represented by active
nodes: node being active corresponds to feature being included in the
scene. Not all features present in the scene are active at the same
time, some of them can activate periodically, every several tacts or
more, and some other features can be represented by summarizing
simplified features (node 'apple' instead of 3D sketch of its
surface).

Network edges (links) activate the nodes. If condition (configuration
of nodes from which link originates) for a link is satisfied, and link
is active, it activates the target node.

Activation in the network follows a variation of Hebbian rule,
'induction rule' (which is essential for mechanism of instance
representation): link becomes active (starts to activate its target
node) only if it observed that node to be activated after condition
for link was satisfied in a majority of cases (like 90% or more). So,
if some node is activated in a network, there are good reasons for
that, no blind association-seeking.

Representation of instances. If scene contains multiple instances of
the same object (or pattern, say an apple), and these patterns are not
modified in it, there is no point in representing those instances
separately: all places at which instances are located ('instantiation
points', say places where apples lie or hang) refer to the same
pattern. The only problem is modification of instances in specific
instantiation points.

This scene can be implemented by creating links from instantiation
points to nodes that represent the pattern. As a result, during
activation cycle of represented scene, activation of instantiation
points leads to activation of patterns (as there's only one pattern
for each instantiation point, so induction rule works in this
direction), but not in other direction (as there are many
instantiation points for the pattern, none of them will be a target of
a link originating from the pattern).

This one-way activation results in a propagation of 'activation waves'
from instantiation points to the pattern, so that each wave 'outlines'
both pattern and instantiation point. These waves effectively
represent instances. If there's a modifier associated with specific
instantiation point, during an activation wave it will activate during
the same wave as pattern does, and as a result it can be applied to
it. As other instantiation points refer to the pattern 'by value',
pattern at those points won't change much.

Also, this way of representing instances is central to extraction of
similarities: if several objects are similar, they will share some of
their nodes and as a result their structures will influence one
another, creating a pressure to extract a common pattern.


I have questions at this point.

Your notion of instantiation point sounds like what I would call an 
instance node which is created on the fly.


There is nothing wrong with this in principle, I believe, but it all 
depends on the details of how these things are handled.  For example, it 
requires a *substantial* modification of the neural network idea to 
allow for the rapid formation of instance nodes, and that modification 
is so substantial that it would dominate the behavior of the system.  I 
don't know if you follow the colloquialism, but there is a sense in 
which the tail is wagging the dog:  the instance nodes are such an 
important mechanism that everything depends on the details of how they 
are handled.


So, to consider one or two of the details that you mention.  You would 
like there to be only a one-way connection between the generic node (do 
you call this the pattern node?) and the instance node (instantiation 
point?), so that the latter can contact the former, but not vice-versa. 
 Does this not contradict the data from psychology (if you care about 
that)?  For instance, we are able to see a field of patterns, of 
different colors, and then when someone says the phrase the green 
patterns we find that the set of green patterns jumps out at us from 
the scene.  It is as if we did indeed have links from the generic 
concept [green pattern] to all the instances.


You may not care about the psychology, but 

RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread Ed Porter
They need not be.

-Original Message-
From: Mark Waser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 6:04 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

Dimensions is an awfully odd word for that since dimensions are normally 
assumed to be orthogonal.

- Original Message - 
From: Ed Porter [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, December 05, 2007 5:08 PM
Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]


Mark,

The paper said:

Conceptually we begin by enumerating all tree fragments that occur in the
training data 1,...,n.

Those are the dimensions, all of the parse tree fragments in the training
data.  And as I pointed out in an email I just sent to Richard, although
usually only a small set of them are involved in any one match between two
parse trees, they can all be used over set of many such matches.

So the full dimensionality is actually there, it is just that only a
particular subset of them are being used at any one time.  And when the
system is waiting for the next tree to match it is potentially capability of
matching it against any of its dimensions.

Ed Porter

-Original Message-
From: Mark Waser [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 05, 2007 3:07 PM
To: agi@v2.listbox.com
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

ED PORTER= The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.

OK.  I'll bite.  So what do *you* believe that these dimensions are?  Words?

Word pairs?  Entire sentences?  Different trees?


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72742511-f9bb8b

Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread David Hart
On 12/5/07, Matt Mahoney [EMAIL PROTECTED] wrote:


 [snip]  Centralized search is limited to a few big players that
 can keep a copy of the Internet on their servers.  Google is certainly
 useful,
 but imagine if it searched a space 1000 times larger and if posts were
 instantly added to its index, without having to wait days for its spider
 to
 find them.  Imagine your post going to persistent queries posted days
 earlier.
 Imagine your queries being answered by real human beings in addition to
 other
 peers.

 I probably won't be the one writing this program, but where there is a
 need, I
 expect it will happen.



Wikia, the company run by Wikipedia founder Jimmy Wales, is tackling the
Internet-scale distributed search problem -
http://search.wikia.com/wiki/Atlas

Connecting to related threads (some recent, some not-so-recent), the Grub
distributed crawler ( http://search.wikia.com/wiki/Grub ) is intended to be
one of many plug-in Atlas Factories. A development goal for Grub is to
enhance it with a NL toolkit (e.g. the soon-to-be-released RelEx), so it can
do more than parse simple keywords and calculate statistical word
relationships.

-dave

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72165246-397899

RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

2007-12-05 Thread John G. Rose
 From: Matt Mahoney [mailto:[EMAIL PROTECTED]
 My design would use most of the Internet (10^9 P2P nodes).  Messages
 would be
 natural language text strings, making no distinction between documents,
 queries, and responses.  Each message would have a header indicating the
 ID
 and time stamp of the originator and any intermediate nodes through
 which the
 message was routed.  A message could also have attached files.  Each
 node
 would have a cache of messages and its own policy on which messages it
 decides
 to keep or discard.
 
 The goal of the network is to route messages to other nodes that store
 messages with matching terms.  To route an incoming message x, it
 matches
 terms in x to terms in stored messages and sends copies to nodes that
 appear
 in those headers, appending its own ID and time stamp to the header of
 the
 outgoing copies.  It also keeps a copy, so that the receiving nodes
 knows that
 they know it has a copy of x (at least temporarily).
 
 The network acts as a distributed database with a distributed search
 function.
  If X posts a document x and Y posts a query y with matching terms, then
 the
 network acts to route x to Y and y to X.


The very tricky but required part of creating a global network like this is
going from zero nodes to whatever the goal is. I think that much emphasis of
a design needs to be put into the growth function. If you have 50 nodes
running how do you get to 500? And 500 to 5,000? And then if it goes down
from 50,000 to 10,000 fast how is it revived before crash? Engineering
expertise, ingenuity + maybe psychological and sociological wisdom can be
used to make this happen. And we all know that the growth could happen
quickly, even overnight. 

Then once getting to 10^9 nodes they have to be maintained or they can die
quickly and even instantaneously. 

Having an intelligent botnet has its advantages. Once it's running and users
try to uninstall it the botnet can try to fight for survival by reasoning
with the users. You could make it such that a user has to verbally
communicate with it to remove it. The botnet could stall and ask things like
Why are you doing this to me after all I have done for you? User:sorry
charlie, I command you to uninstall! Bot:OK let's cut a deal... I know we
can work this out...

John


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72911975-ce1dcc


Re: [agi] None of you seem to be able ...

2007-12-05 Thread Benjamin Goertzel
Tintner wrote:
 Your paper represents almost a literal application of the idea that
 creativity is ingenious/lateral. Hey it's no trick to be just
 ingenious/lateral or fantastic.

Ah ... before creativity was what was lacking.  But now you're shifting
arguments and it's something else that is lacking ;-)


 You clearly like producing new psychological ideas - from a skimming of your
 work, you've produced several. However, I didn't come across a single one
 that was grounded or where any attempt was made to ground them in direct,
 fresh observation (as opposed to occasionally referring to an existing
 scientific paper).

That is a very strange statement.

In fact nearly all my psychological ideas
are grounded in direct, fresh **introspective** observation ---
but they're not written up that way
because that's not the convention in modern academia.  To publish your ideas
in academic journals, you need to ground them in the existing research
literature,
not in your own personal introspective observations.

It is true that few of my psychological hypotheses are grounded in my own novel
lab experiments, though.  I did a little psych lab work in the late
90's, in the domain of
perceptual illusions -- but the truth is that psych and neuroscience
are not currently
sophisticated enough to allow empirical investigation of really
interesting questions about
the nature of cognition, self, etc.  Wait a couple decades, I guess.

In terms of creative psychology, that is consistent with
 your resistance to producing prototypes - and grounding your
 invention/innovation.

Well, I don't have any psychological resistance to producing working
software, obviously.

Most of my practical software work has been proprietary for customers; but,
check out MOSES and OpenBiomind on Google Code -- two open-source projects that
have emerged from my Novamente LLC and Biomind LLC work ...

It just happens that AGI does not lend itself to prototyping, for
reasons I've already tried
and failed to explain to you

We're gonna launch trainable, adaptive virtual animals in Second Life sometime
in 2008  But I won't consider them real prototypes of Novamente
AGI, even though in
fact they will use several aspects of the Novamente Cognition Engine
software.  They
won't embody the key emergent structures/dynamics that I believe need
to be there to have
human-level cognition -- and there is no simple prototype system that
will do so.

You celebrate Jeff Hawkins' prototype systems, but have you tried
them?  He's built
(or, rather Dileep George has built)
an image classification engine, not much different in performance from
many others out there.
It's nice work but it's not really an AGI prototype, it's an image classifiers.
He may be sort-of labeling it a prototype of his AGI approach -- but
really, it doesn't prove anything
dramatic about his AGI approach.  No one who inspected his code and
ran it would think that it
did provide such proof.

 There are at least two stages of creative psychological development - which
 you won't find in any literature. The first I'd call simply original
 thinking, the second is truly creative thinking. The first stage is when
 people realise they too can have new ideas and get hooked on the excitement
 of producing them. Only much later comes the second stage, when thinkers
 realise that truly creative ideas have to be grounded. Arguably, the great
 majority of people who may officially be labelled as creatives, never get
 beyond the first stage - you can make a living doing just that. But the most
 beautiful and valuable ideas come from being repeatedly refined against the
 evidence. People resist this stage because it does indeed mean a lot of
 extra work , but it's worth it.  (And it also means developing that inner
 faculty which calls for actual evidence).

OK, now you're making a very different critique than what you started
with though.

Before you were claiming there are no creative ideas in AGI.

Now, when confronted with creative ideas, you're complaining that they're not
grounded via experimental validation.

Well, yeah...

And the problem is that if one's creative ideas pertain to the
dynamics of large-scale,
complex software systems, then it takes either a lot of time or a lot
of money to achieve
this validation that you mention.

It is not the case that I (and other AGI researchers) are somehow
psychologically
undesirous of seeing our creative ideas explored via experiment.  It
is, rather, the case
that doing the relevant experiments requires a LOT OF WORK, and we are
few in number
with relatively scant resources.

What I am working toward, with Novamente and soon with OpenCog as
well, is precisely
the empirical exploration of the various creative ideas of myself,
others whose work has
been built on in the Novamente design, and my colleagues...

-- Ben G

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:

Distributed search (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])

2007-12-05 Thread Matt Mahoney

--- Ed Porter [EMAIL PROTECTED] wrote:

 Matt,
 
 Perhaps your are right.  
 
 But one problem is that big Google-like compuplexes in the next five to ten
 years will be powerful enough to do AGI and they will be much more efficient
 for AGI search because the physical closeness of their machines will make it
 possible for them to perform the massive interconnected needed for powerful
 AGI much more efficiently.

Google controls about 0.1% of the world's computing power.  But I think their
ability to achieve AGI first will not be so much due to the high bandwidth of
their CPU cluster, as that nobody controls the other 99.9%.

Centralized search tends to produce monopolies as the cost of entry goes up. 
It is not so bad now because Google still has a (dwindling) set of
competitors.  They can't yet hide content that threatens them.

Distributed search like Wikia/Atlas/Grub is interesting, but if people don't
see a compelling need for it, it won't happen.  How big will it have to get
before it is better than Google?  File sharing networks would probably be a
lot bigger and more useful (with mostly legitimate content) if we could solve
the distributed search problem.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=72969535-74e4ee


Distrubuted message pool (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])

2007-12-05 Thread Matt Mahoney
--- John G. Rose [EMAIL PROTECTED] wrote:

  From: Matt Mahoney [mailto:[EMAIL PROTECTED]
  My design would use most of the Internet (10^9 P2P nodes).  Messages
  would be
  natural language text strings, making no distinction between documents,
  queries, and responses.  Each message would have a header indicating the
  ID
  and time stamp of the originator and any intermediate nodes through
  which the
  message was routed.  A message could also have attached files.  Each
  node
  would have a cache of messages and its own policy on which messages it
  decides
  to keep or discard.
  
  The goal of the network is to route messages to other nodes that store
  messages with matching terms.  To route an incoming message x, it
  matches
  terms in x to terms in stored messages and sends copies to nodes that
  appear
  in those headers, appending its own ID and time stamp to the header of
  the
  outgoing copies.  It also keeps a copy, so that the receiving nodes
  knows that
  they know it has a copy of x (at least temporarily).
  
  The network acts as a distributed database with a distributed search
  function.
   If X posts a document x and Y posts a query y with matching terms, then
  the
  network acts to route x to Y and y to X.
 
 
 The very tricky but required part of creating a global network like this is
 going from zero nodes to whatever the goal is. I think that much emphasis of
 a design needs to be put into the growth function. If you have 50 nodes
 running how do you get to 500? And 500 to 5,000? And then if it goes down
 from 50,000 to 10,000 fast how is it revived before crash? Engineering
 expertise, ingenuity + maybe psychological and sociological wisdom can be
 used to make this happen. And we all know that the growth could happen
 quickly, even overnight. 

Getting the network to grow means providing enough incentive that people will
want to install your software.  A distributed message pool offers two
services: distributed search and a message posting service.  Information has
negative value, so it is the second service that provides the incentive.  You
type your message into a client window, and it instantly becomes available to
anyone who enters a query with matching terms.

 Then once getting to 10^9 nodes they have to be maintained or they can die
 quickly and even instantaneously. 

How?  A peer would a piece of software that people would use every day, like a
web browser or email.  People aren't going to suddenly decide to uninstall
them all at once or turn off their computers.  One possible scenario is a
virus or worm spreading quickly from peer to peer.  Hopefully there will be a
wide variety of peers offering different services, so that individual
vulnerabilities could affect only a small part of the network.

 Having an intelligent botnet has its advantages. Once it's running and users
 try to uninstall it the botnet can try to fight for survival by reasoning
 with the users. You could make it such that a user has to verbally
 communicate with it to remove it. The botnet could stall and ask things like
 Why are you doing this to me after all I have done for you? User:sorry
 charlie, I command you to uninstall! Bot:OK let's cut a deal... I know we
 can work this out...

Well, I expect the intelligence to come from having a large number of
specialized but relatively dumb peers, and a network that can direct your
queries to the right ones.  Peers would individually be under the control of
their human owners, just as web servers and clients are now.  It's not like
you could command the Internet to uninstall anyway.

Eventually we will need to deal with the problem of the network becoming
smarter than us, but I think the threshold of concern is when the collective
computing power in silicon exceeds the collective computing power in carbon. 
Right now the Internet has about as much computing power as a few hundred
human brains, but we still have a ways to go to the singularity.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=73000478-537c13


RE: Distributed search (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])

2007-12-05 Thread Ed Porter
I have a lot of respect for Google, but I don't like monopolies, whether it
is Microsoft or Google.  I think it is vitally important that there be
several viable search competators.  

I wish this wicki one luck.  As I said, it sounds a lot like your idea.

Ed Porter

-Original Message-
From: Matt Mahoney [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 9:24 PM
To: agi@v2.listbox.com
Subject: Distributed search (was RE: Hacker intelligence level [WAS Re:
[agi] Funding AGI research])


--- Ed Porter [EMAIL PROTECTED] wrote:

 Matt,
 
 Perhaps your are right.  
 
 But one problem is that big Google-like compuplexes in the next five to
ten
 years will be powerful enough to do AGI and they will be much more
efficient
 for AGI search because the physical closeness of their machines will make
it
 possible for them to perform the massive interconnected needed for
powerful
 AGI much more efficiently.

Google controls about 0.1% of the world's computing power.  But I think
their
ability to achieve AGI first will not be so much due to the high bandwidth
of
their CPU cluster, as that nobody controls the other 99.9%.

Centralized search tends to produce monopolies as the cost of entry goes up.

It is not so bad now because Google still has a (dwindling) set of
competitors.  They can't yet hide content that threatens them.

Distributed search like Wikia/Atlas/Grub is interesting, but if people don't
see a compelling need for it, it won't happen.  How big will it have to get
before it is better than Google?  File sharing networks would probably be a
lot bigger and more useful (with mostly legitimate content) if we could
solve
the distributed search problem.


-- Matt Mahoney, [EMAIL PROTECTED]

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?;

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244id_secret=73068614-a9079e