Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 16, 2008, at 9:51 PM, YKY (Yan King Yin) wrote:

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?



No, you are not correct about this.  All good database engines use a  
combination of clever adaptive cache replacement algorithms (read:  
keeps stuff you are most likely to access next in RAM) and cost-based  
optimization (read: optimizes performance by adaptively selecting  
query execution algorithms based on measured resource access costs) to  
optimize performance across a broad range of use cases.  For highly  
regular access patterns (read: similar query types and complexity),  
the engine will converge on very efficient access patterns and  
resource management that match this usage.  For irregular access  
patterns, it will attempt to dynamically select the best options given  
recent access history and resource cost statistics -- not always the  
best result (on occasion hand optimization could do better), but more  
likely to produce good results than simpler rule-based optimization on  
average.


Note that by good database engine I am talking engines that actually  
support these kinds of tightly integrated and adaptive management  
features: Oracle, DB2, PostgreSQL, et al.  This does *not* include  
MySQL, which is a naive and relatively non-adaptive engine, and which  
scales much worse and is generally slower than PostgreSQL anyway if  
you are looking for a free open source solution.



I would also point out that different engines are optimized for  
different use cases.  For example, while Oracle and PostgreSQL share  
the same transaction model, Oracle design decisions optimized for  
massive numbers of small concurrent update transactions and PostgreSQL  
design decisions optimized for massive numbers of small concurrent  
insert/delete transaction.  Databases based on other transaction  
models, such as IBM's DB2, sacrifice extreme write concurrency for  
superior read-only performance.  There are unavoidable tradeoffs with  
such things, so the market has a diverse ecology of engines that have  
chosen a different set of tradeoffs and buyers should be aware of what  
these tradeoffs are if scalable performance is a criteria.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Nikolay Ognyanov

IMHO :

The stated expected benefit of AGI development is overly ambitious on the
sciencetechnology side and not ambitious enough on the socialeconomy
side. For AGI to become the Next Big Thing it does not really have to come
up with the best medical researcher. Nor would a great medical researcher
have as much impact on the way current civilization works as replacement
of human workers in the sector of services.  Impact of previous technology
revolutions can be described in a very fundamental way as freeing 
(liberating?
discharging?) people from engagement in hunting and similar, then 
agriculture
and similar, then industry and similar. Well, industry in still in the 
working
and AGI could help there too but the direction is clear. Services are 
next area
of human socialeconomic activity to benefit and suffer at same scale as 
others
did earlier from technology. This  is the most obvious general social 
role and
selling point of AGI at least until/unless it becomes true deux ex 
machina ;).
To liberate (but also : discharge, which is going to be a huge 
adoption/penetration
problem) humans from engagement in providing economically significant 
services
to other humans. What such roles and how does AGI address/fulfill should 
be the
key metric if it is to be sold outside a community which is motivated 
by the

intellectual challenge alone.

So IMHO if you want to sell AGI to investors you better start with replacing
travel agents, brokers, receptionists, personal assistants etc. etc. 
rather than

researchers.

Regards
Nikolay

Richard Loosemore wrote:


I have stuck my neck out and written an Open Letter to AGI (Artificial 
General Intelligence) Investors on my website at http://susaro.com.


All part of a campaign to get this field jumpstarted.

Next week I am going to put up a road map for my own development project.




Richard Loosemore




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?; 


Powered by Listbox: http://www.listbox.com



--

*Nikolay Ognyanov, PhD*
Chief Technology Officer
*TravelStoreMaker.com Inc.* http://www.travelstoremaker.com/
Phone: +359 2 933 3832
Fax: +359 2 983 6475

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Richard Loosemore

Nikolay Ognyanov wrote:

IMHO :

The stated expected benefit of AGI development is overly ambitious on the
sciencetechnology side and not ambitious enough on the socialeconomy
side. For AGI to become the Next Big Thing it does not really have to come
up with the best medical researcher. Nor would a great medical researcher
have as much impact on the way current civilization works as replacement
of human workers in the sector of services.  Impact of previous technology
revolutions can be described in a very fundamental way as freeing 
(liberating?
discharging?) people from engagement in hunting and similar, then 
agriculture
and similar, then industry and similar. Well, industry in still in the 
working
and AGI could help there too but the direction is clear. Services are 
next area
of human socialeconomic activity to benefit and suffer at same scale as 
others
did earlier from technology. This  is the most obvious general social 
role and
selling point of AGI at least until/unless it becomes true deux ex 
machina ;).
To liberate (but also : discharge, which is going to be a huge 
adoption/penetration
problem) humans from engagement in providing economically significant 
services
to other humans. What such roles and how does AGI address/fulfill should 
be the
key metric if it is to be sold outside a community which is motivated 
by the

intellectual challenge alone.

So IMHO if you want to sell AGI to investors you better start with replacing
travel agents, brokers, receptionists, personal assistants etc. etc. 
rather than

researchers.


I'm sorry, but this makes no sense at all:  this is a complete negation 
of what AGI means.


If you could build a (completely safe, I am assuming) system that could 
think in *every* way as powerfully as a human being, what would you 
teach it to become:


1) A travel Agent.

2) A medical researcher who could learn to be the world's leading 
specialist in a particular field, and then be duplicated so that you 
instantly had 1,000 world-class specialists in that field.


3) An expert in AGI system design, who could then design a faster 
generation of AGI systems, so that, as a researcher in any scientific 
field, these second-generation systems could generate new knowledge 
faster than all the human scientists and engineers on the planet.


?

To say to an investor that AGI would be useful because we could use them 
to build travel agents and receptionists is to utter something 
completely incoherent.


This is the Everything Just The Same, But With Robots fallacy.



Richard Loosemore


---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:

 No, you are not correct about this.  All good database engines use a
 combination of clever adaptive cache replacement algorithms (read: keeps
 stuff you are most likely to access next in RAM) and cost-based optimization
 (read: optimizes performance by adaptively selecting query execution
 algorithms based on measured resource access costs) to optimize performance
 across a broad range of use cases.  For highly regular access patterns
 (read: similar query types and complexity), the engine will converge on very
 efficient access patterns and resource management that match this usage.
 For irregular access patterns, it will attempt to dynamically select the
 best options given recent access history and resource cost statistics -- not
 always the best result (on occasion hand optimization could do better), but
 more likely to produce good results than simpler rule-based optimization on
 average.

 Note that by good database engine I am talking engines that actually
 support these kinds of tightly integrated and adaptive management features:
 Oracle, DB2, PostgreSQL, et al.  This does *not* include MySQL, which is a
 naive and relatively non-adaptive engine, and which scales much worse and is
 generally slower than PostgreSQL anyway if you are looking for a free open
 source solution.


 I would also point out that different engines are optimized for different
 use cases.  For example, while Oracle and PostgreSQL share the same
 transaction model, Oracle design decisions optimized for massive numbers of
 small concurrent update transactions and PostgreSQL design decisions
 optimized for massive numbers of small concurrent insert/delete transaction.
  Databases based on other transaction models, such as IBM's DB2, sacrifice
 extreme write concurrency for superior read-only performance.  There are
 unavoidable tradeoffs with such things, so the market has a diverse ecology
 of engines that have chosen a different set of tradeoffs and buyers should
 be aware of what these tradeoffs are if scalable performance is a criteria.


Thanks for the info -- I studied database systems almost a decade ago,
so I can hardly remember the details =)

ARC (Adaptive Cache Replacement) seems to be one of the most popular
methods, and it's based on keeping track of frequently used and
recently used.  Unfortunately, for AGI / inference purposes, those
may not be the right optimization objectives.

The requirement of inference is that we need to access a lot of
*different* nodes, but the same nodes may not be required many times.
Perhaps what we need is to *bundle* up nodes that are associated with
each other, so we can read a whole block of nodes with 1 disk access.
This requires a very special type of storage organization -- it seems
that existing DBMSs don't have it =(

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread J Storrs Hall, PhD
On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote:
 If you could build a (completely safe, I am assuming) system that could 
 think in *every* way as powerfully as a human being, what would you 
 teach it to become:
 
 1) A travel Agent.
 
 2) A medical researcher who could learn to be the world's leading 
 specialist in a particular field,...

Travel agent. Better yet, housemaid. I can teach it to become these things 
because I know how to do them. Early AGIs will be more likely to be 
successful at these things because they're easier to learn. 

This is sort of like Orville Wright asking, If I build a flying machine, 
what's the first use I'll put it to: 
1) Carrying mail.
2) A manned moon landing.

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
To use an example,

If a lot of people search for Harry Porter, then a conventional
database system would make future retrieval of the Harry Porter node
faster.

But the requirement of the inference system is such that, if Harry
Porter is fetched, then we would want *other* things that are
associated with Harry Porter to be retrieved faster in the future, for
example items such as JK Rowling or fantasy fiction.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Richard Loosemore

J Storrs Hall, PhD wrote:

On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote:
If you could build a (completely safe, I am assuming) system that could 
think in *every* way as powerfully as a human being, what would you 
teach it to become:


1) A travel Agent.

2) A medical researcher who could learn to be the world's leading 
specialist in a particular field,...


Travel agent. Better yet, housemaid. I can teach it to become these things 
because I know how to do them. Early AGIs will be more likely to be 
successful at these things because they're easier to learn. 


Yes, that shows deep analysis and insight into the problem.

I can just see the first AGI corporation now, having spent a hundred 
million dollars in development money, deciding to make a profit by 
selling a housemaid robot that will replace the cheap, almost-slave 
labor coming across the border from Mexico.


Of course, it would not occur to that company to develop their systems 
just a litle more and get the AGI to do high-value intellectual work.





Richard Loosemore

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser
No.  You are not correct.  Most DBMS's compile and optimize complex queries 
as a separate operation before doing data retrieval -- but even the most 
complex query is actually implemented as a series of simple retrievals 
(which is what the database is truly designed to do).  On the other hand, 
communication to and from your database -- particularly across a network --  
is very likely to be a speed problem.  My solution is to actually implement 
your inference in the database engine.  That way the database handles all of 
your memory management, caching, storage, etc., etc.


- Original Message - 
From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 12:51 AM
Subject: [agi] database access fast enough?



For those using database systems for AGI, I'm wondering if the data
retrieval rate would be a problem.

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Mark Waser
So IMHO if you want to sell AGI to investors you better start with 
replacing
travel agents, brokers, receptionists, personal assistants etc. etc. 
rather than

researchers.


I'm sorry, but this makes no sense at all:  this is a complete negation of 
what AGI means.


Actually . . . . sorry, Richard . . . . but why does it matter what AGI 
means?  You are trying to sell a product for money.  Why do you insist on 
attempting to sell someone something that they don't want just because *you* 
believe that it's better than what they do want?  Why not just sell them 
what they want (since they get it for free with what you want) and be happy 
that they're willing to fund you?


If you could build a (completely safe, I am assuming) system that could 
think in *every* way as powerfully as a human being, what would you teach 
it to become:

1) A travel Agent.
2) A medical researcher 3) An expert in AGI system design,


4) All of the above.  But I'd just market it as a travel agent to the people 
who want a travel agent and a medical researcher to the drug companies (the 
AGI expert would have it figured out but would have no spare cash :-)..


To say to an investor that AGI would be useful because we could use them 
to build travel agents and receptionists is to utter something completely 
incoherent.


Not at all.  It is catering to their desires and refraining from forcibly 
educating them.  Where is the harm?  It's certainly better than getting the 
door slammed in your face.



This is the Everything Just The Same, But With Robots fallacy.


No, it's not because you're not saying that everything is going to be the 
same.  All you're saying is that travel agents *can* be replaced without 
insisting on pointing out that *EVERYTHING* is likely to be replaced.




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed
YKY

Here is what I learned from implementing the Texai knowledge base. It persists 
symbolic statements about concepts.
I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on 
SuSE 64-bit Linux.  My Java application driving MySQL dramatically slowed down 
when the number of rows exceeded 20 million as compared to the initial load of 
5 million rows.
I then tried Oracle Berkeley DB Java Edition (open source) which provides no 
SQL query facility, instead one programs directly to its API for inserts, 
queries, updates and so forth.  It is faster than MySQL for my large KB, but 
uses four times as much disk space due to its method of inserting new rows at 
the end of the file, and having lots of free space.I then studied partitioning, 
which means to break up the monolithic KB into smaller databases in which 
accesses are expected to be clustered.  And I studied sharding, which means to 
slice up a database into logical segments that are hosted by separate db 
engines, typically with separate disk filesystems.I began writing my own 
storage engine, for a fast, space-efficient, partitioned and sharded knowledge 
base, soon realizing that this was far too big a task for a sole developer.   
Revisiting my project object persistence needs, and thinking more about 
interoperability with semantic web technologies, I decided to convert my 
existing KB to an RDF-compatible form and then to evaluate RDF quad 
stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which 
is Java based and open source and thus very compatible with my other 
components.   In Texai, RDF queries have a simpler form than SQL queries when 
retrieving logical statements from a store.  For example, in SQL my schema had 
to provide separate tables for each object type:  concept term, functional 
term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 
rule, arity-3 rule, arity-4 rule and arity-5 rule.  Many of these tables would 
have to be joined for a typical query (e.g. what concepts subsume a given 
concept?).
My development Linux computer has 4 GB of memory, and Linux has a feature 
called tmpfs which permits mounting a directory in RAM.  I partitioned my KB 
into separate KBs of less than six million rows each.  In Sesame these are less 
than one GB in size and I can therefore put any one of them in tmpfs - running 
that application-relevant part of the KB at RAM speed.   Experiments 
demonstrate about a 10 times speedup.
When Texai is deployed, I expect that the application will log its transactions 
to disk as a background process as a safeguard against losing the volatile KB 
in tmpfs.Hope this information is useful.
-Steve
 
Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860

- Original Message 
From: YKY (Yan King Yin) [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, April 16, 2008 11:51:35 PM
Subject: [agi] database access fast enough?

 For those using database systems for AGI, I'm wondering if the data
retrieval rate would be a problem.

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?;
Powered by Listbox: http://www.listbox.com







  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread J Storrs Hall, PhD
Well, I haven't seen any intelligent responses to this so I'll answer it 
myself:

On Thursday 17 April 2008 06:29:20 am, J Storrs Hall, PhD wrote:
 On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote:
  If you could build a (completely safe, I am assuming) system that could 
  think in *every* way as powerfully as a human being, what would you 
  teach it to become:
  
  1) A travel Agent.
  
  2) A medical researcher who could learn to be the world's leading 
  specialist in a particular field,...
 
 Travel agent. Better yet, housemaid. I can teach it to become these things 
 because I know how to do them. Early AGIs will be more likely to be 
 successful at these things because they're easier to learn. 
 
 This is sort of like Orville Wright asking, If I build a flying machine, 
 what's the first use I'll put it to: 
 1) Carrying mail.
 2) A manned moon landing.

Q: You've got to be kidding. There's a huge difference between a mail-carrying 
fabric-covered open-cockpit biplane and the Apollo spacecraft. It's not 
comparable at all.

A: It's only about 50 years' development. More time elapsed between railroads 
and biplanes. 

Q: Do you think it'll take 50 years to get from travel agents to medical 
researchers?

A: No, the pace of development has speeded up, and will speed up more so with 
AGI. But as in the mail/moon example, the big jump will be getting off the 
ground in the first place.

Q: So why not just go for the researcher? 

A: Same reason Orville didn't go for the moon rocket. We build Rosie the 
maidbot first because:
1) we know very well what it's actually supposed to do, so we know if it's 
learning it right
2) we even know a bit about how its internal processing -- vision, motion 
control, recognition, navigation, etc -- works or could work, so we'll have 
some chance of writing programs that can learn that kind of thing.
3) It's easier to learn to be a housemaid. There are lots of good examples. 
The essential elements of the task are observable or low-level abstractions. 
While the robot is learning to wash windows, we the AGI researchers are going 
to learn how to write better learning algorithms by watching how it learns.
4) When, not if, it screws up, a natural part of the learning process, 
there'll be broken dishes and not a thalidomide disaster.

The other issue is that the hard part of this is the learning. Say it takes a 
teraop to run a maidbot well, but petaop to learn to be a maidbot. We run the 
learning on our one big machine and sell the maidbots cheap with 0.1% the 
cpu. But being a researcher is all learning -- so each one would need the 
whole shebang for each copy. A decade of Moore's Law ... and at least that of 
AGI research.

Josh

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser
And, as far as I'm concerned, the last clause of item 4 and the transition to 5 
and 6 clearly demonstrates why Steve seems to be making a lot of progress 
compared to everyone else.
  - Original Message - 
  From: Stephen Reed 
  To: agi@v2.listbox.com 
  Sent: Thursday, April 17, 2008 10:23 AM
  Subject: Re: [agi] database access fast enough?


  YKY

  Here is what I learned from implementing the Texai knowledge base. It 
persists symbolic statements about concepts.

1.. I designed an SQL schema to persist OpenCyc in its full CycL form, in 
MySQL on SuSE 64-bit Linux.  My Java application driving MySQL dramatically 
slowed down when the number of rows exceeded 20 million as compared to the 
initial load of 5 million rows.

2.. I then tried Oracle Berkeley DB Java Edition (open source) which 
provides no SQL query facility, instead one programs directly to its API for 
inserts, queries, updates and so forth.  It is faster than MySQL for my large 
KB, but uses four times as much disk space due to its method of inserting new 
rows at the end of the file, and having lots of free space.
3.. I then studied partitioning, which means to break up the monolithic KB 
into smaller databases in which accesses are expected to be clustered.  And I 
studied sharding, which means to slice up a database into logical segments that 
are hosted by separate db engines, typically with separate disk filesystems.
4.. I began writing my own storage engine, for a fast, space-efficient, 
partitioned and sharded knowledge base, soon realizing that this was far too 
big a task for a sole developer.   

5.. Revisiting my project object persistence needs, and thinking more about 
interoperability with semantic web technologies, I decided to convert my 
existing KB to an RDF-compatible form and then to evaluate RDF quad stores.
6.. After some analysis, I chose to evaluate the Sesame 2 RDF store, which 
is Java based and open source and thus very compatible with my other 
components.   In Texai, RDF queries have a simpler form than SQL queries when 
retrieving logical statements from a store.  For example, in SQL my schema had 
to provide separate tables for each object type:  concept term, functional 
term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 
rule, arity-3 rule, arity-4 rule and arity-5 rule.  Many of these tables would 
have to be joined for a typical query (e.g. what concepts subsume a given 
concept?).

7.. My development Linux computer has 4 GB of memory, and Linux has a 
feature called tmpfs which permits mounting a directory in RAM.  I partitioned 
my KB into separate KBs of less than six million rows each.  In Sesame these 
are less than one GB in size and I can therefore put any one of them in tmpfs - 
running that application-relevant part of the KB at RAM speed.   Experiments 
demonstrate about a 10 times speedup.

8.. When Texai is deployed, I expect that the application will log its 
transactions to disk as a background process as a safeguard against losing the 
volatile KB in tmpfs.
  Hope this information is useful.
  -Steve


  Stephen L. Reed


  Artificial Intelligence Researcher
  http://texai.org/blog
  http://texai.org
  3008 Oak Crest Ave.
  Austin, Texas, USA 78704
  512.791.7860



  - Original Message 
  From: YKY (Yan King Yin) [EMAIL PROTECTED]
  To: agi@v2.listbox.com
  Sent: Wednesday, April 16, 2008 11:51:35 PM
  Subject: [agi] database access fast enough?

  For those using database systems for AGI, I'm wondering if the data
  retrieval rate would be a problem.

  Typically we need to retrieve many nodes from the DB to do inference.
  The nodes may be scattered around the DB.  So it may require *many*
  disk accesses.  My impression is that most DBMS are optimized for
  complex queries but not for large numbers of simple retrievals -- am I
  correct about this?

  YKY

  ---
  agi
  Archives: http://www.listbox.com/member/archive/303/=now
  RSS Feed: http://www.listbox.com/member/archive/rss/303/
  Modify Your Subscription: http://www.listbox.com/member/?;
  Powered by Listbox: http://www.listbox.com





--
  Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.

--
agi | Archives  | Modify Your Subscription  

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 2:50 AM, YKY (Yan King Yin) wrote:

ARC (Adaptive Cache Replacement) seems to be one of the most popular
methods, and it's based on keeping track of frequently used and
recently used.  Unfortunately, for AGI / inference purposes, those
may not be the right optimization objectives.



It is a cache replacement algorithm, what would be a right  
optimization objective for such an algorithm?  There is a lot of  
cleverness in the use of the cache to maximize cache efficiency beyond  
the cache replacement algorithm -- it is one of the most heavily  
engineered parts of a database engine.


As an FYI, ARC is patented by IBM.  PostgreSQL uses a different but  
similar algorithm that is indistinguishable from ARC in benchmarks  
(having implemented ARC briefly, not realizing that it was patented).




The requirement of inference is that we need to access a lot of
*different* nodes, but the same nodes may not be required many times.
Perhaps what we need is to *bundle* up nodes that are associated with
each other, so we can read a whole block of nodes with 1 disk access.
This requires a very special type of storage organization -- it seems
that existing DBMSs don't have it =(



Again, most good database engines can do this, as it is a standard  
access pattern for databases, and most databases can solve this  
problem multiple ways.  As an example, clustering and index- 
organization features in databases address your issue here.


It is pretty difficult to generate an access pattern use case that  
they cannot be optimized for with a good database engine.  They are  
very densely engineered pieces of software, designed to be very fast  
while scaling well in multiple dimensions and adapting to varying  
workloads.  On the other hand, if your use case is simple enough you  
can gain some significant speed for modest effort by writing your own  
engine that is purpose-built to be optimized for your needs.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 6:07 AM, Mark Waser wrote:
I have to laugh at your total avoidance of Microsoft SQL Server  
which is arguably faster and better scaling for truly mixed use than  
everything except possibly Oracle on ordinary hardware; which is  
much easier to use than Oracle; and which is the easiest to actually  
put *GOOD* code in the database engine itself (particularly when  
compared to Oracle's *REALLY* poor java imitation).



Discussing SQL Server does not generalize well in that they  
reimplement the core engine design with almost every release once they  
realize they hosed the design with the last release.  For example, up  
until SQL Server 2005 the transaction engine was weak such that  
PostgreSQL could spank it in transaction throughput -- in 2005 they  
switched to a transaction model more like PostgreSQL and Oracle and  
gained some parity.  SQL Server still does not really distribute all  
that easily, unlike Oracle or PostgreSQL.


SQL Server versions before the current two year old one were pretty  
much dogs in a lot of ways.  The most recent version is as you state a  
pretty solid database engine.  Oracle is a major pain in the ass to  
use but does scale well, though for many OLTP loads it is barely  
faster than PostgreSQL these days.



If putting your code in the engine is the goal, PostgreSQL wins by a  
country mile.  The entire engine from front to back is deeply hackable  
with very clean APIs and you can even safely bind binary code into the  
engine at runtime.  That the transaction engine scales quite well is  
just a bonus.  People have already written hooks for a dozen languages  
into it.  I've written performance-sensitive customizations of  
PostgreSQL in the past, and for purposes like that it can often be  
much faster than the commercial alternatives, as the alternatives tend  
to be relatively feature poor and shallow when it comes to engine  
customization.  Making deep and very flexible customization a safe  
core feature was a design decision tradeoff in PostgreSQL that is  
somewhat unique to it.  You can do a lot of really cool software  
implementation tricks with it that Oracle and SQL Server do not do.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Ben Goertzel
  We may well see a variety of proto-AGI applications in different
  domains, sorta midway between narrow-AI and human-level AGI, including
  stuff like

  -- maidbots

  -- AI financial traders that don't just execute machine learning
  algorithms, but grok context, adapt to regime changes, etc.

  -- NL question answering systems that grok context and piece together
  info from different sources

  -- artificial scientists capable of formulating nonobvious hypotheses
  and validating them via data analysis, including doing automated data
  preprocessing, etc.

And not to forget, of course, smart virtual pets and avatars in games
and virtual worlds ;-))

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Ben Goertzel
Hmmm...

It's pretty hard to project the timing of different early-stage AGI
applications, as this depends on the particular route taken to AGI,
and there are many possible routes...

We may well see a variety of proto-AGI applications in different
domains, sorta midway between narrow-AI and human-level AGI, including
stuff like

-- maidbots

-- AI financial traders that don't just execute machine learning
algorithms, but grok context, adapt to regime changes, etc.

-- NL question answering systems that grok context and piece together
info from different sources

-- artificial scientists capable of formulating nonobvious hypotheses
and validating them via data analysis, including doing automated data
preprocessing, etc.

Then, after this phase, we may finally see the emergence of unified
AGI systems with true human-level AGI.

**Or**, it could happen that one of the above apps (or something not
on my list) advances way faster than the others, for fundamental AI
reasons or simply for practical economic reasons ... or due to luck...

**Or**, it could well happen that someone gets all the way to
human-level AGI before any of the above proto-AGI applications really
becomes feasible and economically viable.  In that case the answer
will indeed be: Duh, the AGI can do anything...

Which of these alternatives will happen is not obvious to me.  It's
not even obvious to me under the hypothetical assumption that the
Novamente/OpenCog approach is gonna be the one that gets us to
human-level AGI ... let alone if I drop that assumption and think
about the problem from the perspective of the broad scope of possible
AGI architectures.

So I am a bit perplexed that some folks on this list are so
surpassingly **confident** as to which route is going to unfold  I
don't want to get all Eliezer on you, but really, some reflection on
the human brain's tendency toward overconfidence might be in order ;-O

-- Ben G



On Thu, Apr 17, 2008 at 10:30 AM, J Storrs Hall, PhD [EMAIL PROTECTED] wrote:
 Well, I haven't seen any intelligent responses to this so I'll answer it
  myself:


  On Thursday 17 April 2008 06:29:20 am, J Storrs Hall, PhD wrote:
   On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote:
If you could build a (completely safe, I am assuming) system that could
think in *every* way as powerfully as a human being, what would you
teach it to become:
   
1) A travel Agent.
   
2) A medical researcher who could learn to be the world's leading
specialist in a particular field,...
  
   Travel agent. Better yet, housemaid. I can teach it to become these things
   because I know how to do them. Early AGIs will be more likely to be
   successful at these things because they're easier to learn.
  
   This is sort of like Orville Wright asking, If I build a flying machine,
   what's the first use I'll put it to:
   1) Carrying mail.
   2) A manned moon landing.

  Q: You've got to be kidding. There's a huge difference between a 
 mail-carrying
  fabric-covered open-cockpit biplane and the Apollo spacecraft. It's not
  comparable at all.

  A: It's only about 50 years' development. More time elapsed between railroads
  and biplanes.

  Q: Do you think it'll take 50 years to get from travel agents to medical
  researchers?

  A: No, the pace of development has speeded up, and will speed up more so with
  AGI. But as in the mail/moon example, the big jump will be getting off the
  ground in the first place.

  Q: So why not just go for the researcher?

  A: Same reason Orville didn't go for the moon rocket. We build Rosie the
  maidbot first because:
  1) we know very well what it's actually supposed to do, so we know if it's
  learning it right
  2) we even know a bit about how its internal processing -- vision, motion
  control, recognition, navigation, etc -- works or could work, so we'll have
  some chance of writing programs that can learn that kind of thing.
  3) It's easier to learn to be a housemaid. There are lots of good examples.
  The essential elements of the task are observable or low-level abstractions.
  While the robot is learning to wash windows, we the AGI researchers are going
  to learn how to write better learning algorithms by watching how it learns.
  4) When, not if, it screws up, a natural part of the learning process,
  there'll be broken dishes and not a thalidomide disaster.

  The other issue is that the hard part of this is the learning. Say it takes a
  teraop to run a maidbot well, but petaop to learn to be a maidbot. We run the
  learning on our one big machine and sell the maidbots cheap with 0.1% the
  cpu. But being a researcher is all learning -- so each one would need the
  whole shebang for each copy. A decade of Moore's Law ... and at least that of
  AGI research.

  Josh



  ---
  agi
  Archives: http://www.listbox.com/member/archive/303/=now
  RSS Feed: http://www.listbox.com/member/archive/rss/303/
  Modify 

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:

 You *REALLY* need to get up to speed on current database systems before you
 make more ignorant statements.

 First off, *most* databases RARELY go to the disk for reads.  Memory is
 cheap and the vast majority of complex databases are generally small enough
 that they are normally held in memory during normal operation.

That's true as of now, but let's think one or two steps further:  Do
you really think a mature AGI's (say with 3-6 year-old human
intelligence) KB can reside in RAM, entirely?


 Next, I suspect that whatever bundling you're talking about is likely to
 be along field boundaries and is likely going to be akin to just reading
 an entire FIELD table into memory (that will have the exact same structure
 as all other field tables but will be contiguous on disk so as to promote
 fast loads).

To clarify what I mean:

1.  the DB contains a large number of facts / rules (perhaps stored as
rows in SQL parlance)
2.  many of these rows have to be fetched for inference (Resolution
tests if a rule leads to a successful proof, but more often than not
the rules are discarded).
3.  the rows are scattered all around the DB

For example, let say I want to infer something about Harry Porter
and JK Rowling, I would want to fetch these facts / rules:
1.  Harry Porter is a successful book series
2.  Harry Porter belongs to the fantasy genre
3.  JK Rowling is the author of Harry Porter
4.  JK Rowling is now richer than Queen Elizabeth II.
etc...

But I would probably NOT need facts / rules like:
1.  Einstein is the creator of General Relativity
2.  Water is heavier than oil
etc...

So we should keep track of what rules are usually used *together*, and
perhaps bring them into physically contagious storage.  I'm not sure
which DB feature(s) allow this...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
Hi Stephen,

Thanks for sharing this!  VERY few people have experience with this stuff...

On 4/17/08, Stephen Reed [EMAIL PROTECTED] wrote:
 4. I began writing my own storage engine, for a fast, space-efficient, 
 partitioned and sharded knowledge base, soon realizing that this was far too 
 big a task for a sole developer.

That seems like what we actually need.

 My development Linux computer has 4 GB of memory, and Linux has a feature 
 called tmpfs which permits mounting a directory in RAM.  I partitioned my KB 
 into separate KBs of less than six million rows each.  In Sesame these are 
 less than one GB in size and I can therefore put any one of them in tmpfs - 
 running that application-relevant part of the KB at RAM speed.   Experiments 
 demonstrate about a 10 times speedup.

If the inference requires a rule outside the sub-KB, you'd have to do
a very expensive swap.  I think this only works if you're sure the
entire inference is contained within a sub-KB.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Ben Goertzel
Hi Mark,

  This is, by the way, my primary complaint about Novamente -- far too much
 energy, mind-space, time, and effort has gone into optimizing and repeatedly
 upgrading the custom atom table that should have been built on top of
 existing tools instead of being built totally from scratch.

Really, work on the AtomTable has been a small percentage of work on
the Novamente Cognition Engine ... and, the code running the AtomTable is
now pretty much the same as it was in 2001 (though it was tweaked to make it
64-bit compatible, back in 2004 ... and there has been ongoing bug-removal
as well...).  We wrote some new wrappers for the AtomTable
last year (based on STL containers), but that didn't affect the
internals, just the API.

It's true that a highly-efficient, highly-customizable graph database could
potentially serve the role of the AtomTable, within the NCE or OpenCog.

But that observation is really not
such a big deal.  Potentially, one could just wrap someone else's graph DB
behind the 2007 AtomTable API, and this change would be completely transparent
to the AI processes using the AtomTable.

However, I'm not convinced this would be a good idea.  There are a lot of
useful specialized indices in the AtomTable, and replicating all this in some
other graph DB would wind up being a lot of work ... and we could use that
time/effort on other stuff instead

Using a relational DB rather than a graph DB is not appropriate for the NCE
design, however.

But we've been over this before...

And, this is purely a software implementation issue rather than an AI issue,
of course.  The NCE and OpenCog designs require **some** graph or
hypergraph DB which supports the manual and automated creation of
complex customized indices ... and supports refined cognitive control
over what lives on disk and what lives in RAM, rather than leaving this
up to some non-intelligent automated process.  Given these requirements,
the choice of how to realize them in software is not THAT critical ... and
what we have there now works


-- Ben G

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:

 Again, most good database engines can do this, as it is a standard access
 pattern for databases, and most databases can solve this problem multiple
 ways.  As an example, clustering and index-organization features in
 databases address your issue here.

Thanks... clustered indexing looks promising, but I need to study it
in more details to see if it really solves the problem...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed
YKY said:
If the inference requires a rule outside the sub-KB, you'd have to do
a very expensive swap.  I think this only works if you're sure the
entire inference is contained within a sub-KB.

 
Right.  I envision Texai deployed as distributed agents operating within a 
hierarchical control system.  Each agent's mission will be scoped to require 
immediate access to only a cache of some KB partition.  Hopefully infrequent, 
cache misses will incur the penalty you mention, either to local disk, or worse 
- to the network.  I also expect the system to be adaptive to whatever the 
user's computer allows with regard to resources (e.g. more RAM begets faster 
response).   I am also considering torrent-style transfers to satisfy cache 
misses.  As you point out an AGI's KB query is likely to access other linked 
objects (e.g. spreading activation search).  So given that users will likely 
have asymmetric Internet connection bandwidth, It may be faster for large 
chunks of cache-filling KB data to be obtained simultaneously in slices from a 
multitude of collaborating peer agents.

-Steve


Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860





  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Matt Mahoney
--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote:

 For those using database systems for AGI, I'm wondering if the data
 retrieval rate would be a problem.

I analyzed the scalability of distributed indexing for my thesis (linked at
http://www.mattmahoney.net/agi.html ).  For data randomly distributed in a
vector space model up to log n dimensions, storage is O(n log n), retrieval
time is effectively O(log n) and update is O(log^2 n).  In practice you can do
better because data tends to cluster, reducing the effective number of
dimensions, and because accesses tend to be distributed nonuniformly.  Data
accessed frequently will tend to be cached in nearby nodes.

I realize you are asking about the relational model, but you can implement the
most common transactions, e.g. retrieving or updating a small number of
records at a time, by storing records of the form author timestamp table
field=value field=value   This also gives you transaction logging,
rollback, and authentication, which will be important in any database with
lots of users (I assume AGI).  However I don't think it will be as powerful as
records of the form author timestamp arbitrary_text.

 To use an example,
 
 If a lot of people search for Harry Porter, then a conventional
 database system would make future retrieval of the Harry Porter node
 faster.
 
 But the requirement of the inference system is such that, if Harry
 Porter is fetched, then we would want *other* things that are
 associated with Harry Porter to be retrieved faster in the future, for
 example items such as JK Rowling or fantasy fiction.

A huge relational database would retrieve the fact that Harry Porter won a
gold medal for the high jump in the 1908 Olympics.  A better language model
(like Google) might figure out that you meant Harry Potter :-)


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser
Clustered indexing *WILL* solve your problem if you're willing to include 
all the data you're going to need in the index.  It's definitely a trade-off 
. . . . but arguably a solid one.


- Original Message - 
From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 1:03 PM
Subject: Re: [agi] database access fast enough?



On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:


Again, most good database engines can do this, as it is a standard access
pattern for databases, and most databases can solve this problem multiple
ways.  As an example, clustering and index-organization features in
databases address your issue here.


Thanks... clustered indexing looks promising, but I need to study it
in more details to see if it really solves the problem...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:

 Yes.  RAM is *HUGE*.  Intelligence is *NOT*.

Really?  I will believe that if I see more evidence... right now I'm skeptical.

Also, I'm designing a learning algorithm that stores *hypotheses* in
the KB along with accepted rules.  This will multiply the size of the
KB by a factor.

YKY

PS:  In my last message, contagious should be contiguous... =)

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 9:08 AM, Mark Waser wrote:
Yes, the newest versions of PostgeSQL could spank SQL Server 2000  
after it was several years old.  One tremendous advantage of  
PostgreSQL is it's very short development cycle.



Actually, this was a fundamental and known weakness in the SQL Server  
2000 transactional model, being more like DB2 than Oracle.  Because  
PostgreSQL has used the same kind of model as Oracle -- and for a very  
long time -- it has always been relatively strong at OLTP throughput.   
Until SQL Server 2005, the Microsoft offering was never really  
competitive.  It had little to do with development timelines.  On the  
other hand, PostgreSQL was a bit of a dog at OLAP until relatively  
recently.


You imply that the performance is due to some kind of linear  
development path, but in fact SQL Server 2005 changed its internal  
model to be like Oracle and PostgreSQL so that it could be competitive  
at OLTP.  It is a matter of algorithm selection and tradeoffs, not  
engineering effort.  SQL Server (until two years ago) has always had  
relatively poor lock concurrency, but gave very good baseline OLAP  
performance as a consequence of that decision.  The reality is that it  
is much easier to make the Oracle/Postgres model perform  
satisfactorily at OLAP than to make the old SQL Server model perform  
satisfactorily at OLTP.



-- in 2005 they  switched to a transaction model more like  
PostgreSQL and Oracle and  gained some parity.  SQL Server still  
does not really distribute all  that easily, unlike Oracle or  
PostgreSQL.


Have you ever worked with an Oracle distributed database?  Oracle  
does not distribute well.



I've worked with very large databases on several major platforms,  
including Oracle and SQL Server in many different guises.  Oracle's  
parallel implementation may not distribute that well, but that is  
because traditional transactional semantics are *theoretically  
incapable* of distributing well.  To the extent it is possible at all,  
Oracle does a very good job at making it work.


There are new transactional architectures in academia that should work  
better in a modern distributed environment than any of the current  
commercial adaptations of classical architectures to distributed  
environments.



Oracle only scales well when you know how to properly use it.  In  
most installations that I've seen, Oracle underperforms even SQL  
Server 2000 because the DBA didn't do the necessary to make it  
perform optimally (because Oracle is *NOT* average person  
friendly).  I've made *a lot* of money optimizing people's Oracle  
installations that I shouldn't have been able to make if Oracle  
could get out of it's own way.



No argument here, one of the major problems of Oracle is that it is  
bloody impossible to use well without a full-time staff.  I spent many  
years solving scaling problems on extremely large Oracle systems.  The  
insidiousness of PostgreSQL in the market is that it is very Oracle- 
like at a high-level but *massively* simpler and easier to use and  
administer while still delivering much of the performance and a  
significant subset of the features of Oracle.  SQL Server has done  
well against Oracle for similar reasons.


The main problem with SQL Server these days is that it does not run on  
Unix.  Most of the major historical suckiness does not apply to the  
current version.




Making deep and very flexible customization a safe  core feature  
was a design decision tradeoff in PostgreSQL that is  somewhat  
unique to it. You can do a lot of really cool software   
implementation tricks with it that Oracle and SQL Server do not do.


Yes.  The biggest problems with PostgreSQL are that it doesn't have  
a Microsoft compatibility mode and it isn't clear to corporations  
where you can get *absolutely guaranteed* support.



Sun Microsystems not only officially supports it, they do a lot of  
development on it, as does Fujitsu in Asia, Red Hat and a few other  
large companies that are heavily invested in it.  A significant  
portion of the main PostgreSQL developers do it as their official  
corporate job.


PostgreSQL is very broadly ANSI compatible (including a lot of  
ancillary database standards surrounding SQL), and to the extent it  
has a flavor it clearly borrows from Oracle rather than SQL Server.   
SQL Server has a lot of bits that do not conform to standards that  
everyone else supports. From a historical perspective, PostgreSQL  
shares a transaction model with Oracle, started on Unix, and has been  
around since a time when SQL Server was not something you would want  
to emulate.  PostgreSQL has matured to the point where it mostly  
follows standards to the extent possible but has enough unique  
features and capabilities that it has started to become a flavor of  
its own.



If you could swap out an MS-SQL server *immediately* for a  
PostgreSQL server simply by copy the data and rebinding a WINS 

Re: [agi] database access fast enough?

2008-04-17 Thread Steve Richfield
Everyone,

At startup, I simply had Dr. Eliza cycle through the heavily used part of
the DB, so that it would run in RAM except for unusual access. Of course,
its demo DB now easily fits into RAM. VM paging was a MUCH worse problem
than is DB access. I suspect that unless you lock the code into RAM,
that this may will forever be the case because less-used routines (e.g.
exception handlers) will get pushed out of RAM by the DB engine's scramble
for buffer space, which of course you can limit by tweaking the DB engine.

Also, has any one here looked at using Flash Disks for DB? Vista now puts VM
onto any available flash drives to gain performance.

On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:

 That's true as of now, but let's think one or two steps further:  Do
  you really think a mature AGI's (say with 3-6 year-old human
  intelligence) KB can reside in RAM, entirely?
 

 Yes.  RAM is *HUGE*.  Intelligence is *NOT*.


Hmm, thinking on the keyboard...
~100E9 computing cells with ~50K inputs each, of which ~200 are active.
One theory is that you would only have to carry the active inputs, plus some
fraction of the inactive inputs while you watched for things to happen to
make them active. Let's say that we must track ~1E3 inputs, for a total
of 100E12 or one hundred trillion inputs. We could use fractal means to
generate the original configuration (as biological brains probably do), very
low precision arithmetic with statistical rounding, etc., which would reduce
each input to just a few bytes to maintain, say ~10. This makes a total of
1E15 or one quadrillion bytes to represent a simulated human's instantaneous
state of construction. An entire checkpoint would take little more, because
it would only include in addition the electrical state of each of the 100E9
cells.

Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate
because 4/5 of the represented inputs are presently inactive, for a total of
only 100 terabytes.

Note that ~90% of those 100E9 cells are slow-responding glial cells, so
while the state is large, the computational requirements may be well short
of a petaflop.

Of course, this makes a LOT of assumptions that no one has yet bothered to
confirm in the laboratory, and I do NOT want to ignite an estimates war,
so I invite constructive comments from anyone with more recent data than I
have.

Steve Richfield

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Ben Goertzel
On Thu, Apr 17, 2008 at 2:42 PM, Mark Waser [EMAIL PROTECTED] wrote:

  Really, work on the AtomTable has been a small percentage of work on
  the Novamente Cognition Engine ... and, the code running the AtomTable is
  now pretty much the same as it was in 2001 (though it was tweaked to make
 it
  64-bit compatible, back in 2004 ... and there has been ongoing bug-removal
  as well...).
 

  And . . . and . . . and . . . :-)  It's far more than you're
 admitting to yourself.:-)

That's simply not true, but I know of no way to convince you.

The AtomTable work was full-time work for a two guys for a few months
in 2001, and since then it's been occasional part-time tweaking by two
people who have been full-time engaged on other projects.

  We wrote some new wrappers for the AtomTable
  last year (based on STL containers), but that didn't affect the
  internals, just the API.
 

  Which is what everything should have been designed around anyways -- so
 effectively, last year was a major breaking change that affected *all* the
 software written to the old API.

Yes, but calls to the AT were already well-encapsulated within the code,
so changing from the old API to the new has not been a big deal.

  Absolutely.  That's what I'm pushing for.  Could you please, please publish
 the 2007 AtomTable API?  That's actually far, far more important than the
 code behind it.  Please, please . . . . publish the spec today . . . .
 pretty please with a cherry on top?

It'll be done as part of the initial OpenCog release, which will be pretty
soon now ... I don't have a date yet though...

  However, I'm not convinced this would be a good idea.  There are a lot of
  useful specialized indices in the AtomTable, and replicating all this in
 some
  other graph DB would wind up being a lot of work ... and we could use that
  time/effort on other stuff instead
 

  Which (pardon me, but . . .  ) clearly shows that you're not a professional
 software engineer

I'm not but many other members of the Novamente team are

  My contention is that you all should be
 *a lot* further along than you are.  You have more talent than anyone else
 but are moving at a truly glacial pace.

90% of Novamente LLC's efforts historically have gone into various AI
consulting projects
that pay the bills.

Now, about 60% is going into consulting projects, and 40% is going
into the virtual
pet brain project

We have very rarely had funding to pay folks to work on AGI, so we've
worked on it
in bits and pieces here and there...

Sad, but true...

 I understand that you believe that
 this is primarily due to other reasons but *I am telling you* that A LOT of
 it is also your own fault due to your own software development choices.

You're wrong, but arguing the point over and over isn't getting us
anywhere.

  Worse, fundamentally, currently, you're locking *everyone* into *your*
 implementation of the atom table.

Well, that will not be the case in OpenCog.  The OpenCog architecture
will be such that other containers could be inserted if desired.

Why not let someone else decide whether
 or not it is worth their time and effort to implement those specialized
 indices on another graph DB of their choice?  If you would just open up the
 API and maybe accept some good enhancements (or, maybe even, if necessary,
 some changes) to it?

Yes, that's going to happen within OpenCog.

  Using a relational DB rather than a graph DB is not appropriate for the
 NCE
  design, however.
 

  Incorrect.  If the API is identical and the speed is identical, whether it
 is a relational db or a graph db *behind the scenes* is irrelevant.  Design
 to your API -- *NOT* to the underlying technology.  You keep making this
 mistake.

The speed will not be identical for an important subset of queries, because
of intrinsic limitations of the B-tree datastructures used inside RDB's.  We
discussed this before.


  Seriously -- I think that you're really going to be surprised at how fast
 OpenCog might take off if you'd just relax some control and concentrate on
 the specifications and the API rather than the implementation issues that
 you're currently wasting time on.

I am optimistic about the development speedup we'll see from OpenCog,
but not for the reason you cite.

Rather, I think that by opening it up in an intelligent way, we're simply
going to get a lot more people involved, contributing their code, their
time, and their ideas.  This will accelerate things considerably, if all
goes well.

I repeat that NO implementation time has been spent on the AtomTable
internals for quite some time now.  A few weeks was spent on the API
last year, by one person.  I'm not sure why you want to keep exaggerating
the time put into that component, when after all you weren't involved in
its development at all (and I didn't even know you when the bulk of
that development was being done!!)

I don't care if, in OpenCog, someone replaces the AtomTable internals
with something 

Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser
Actually, this was a fundamental and known weakness in the SQL Server 
2000 transactional model, being more like DB2 than Oracle.


I disagree.  First off, we're talking about the DEFAULT transactional model, 
locking mode, and where new records are placed.  It has always been 
posssible to tweak any of the databases to the other's transactional model. 
Second of all, it was not a weakness -- it was a deliberate choice of 
optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for 
most databases on limited memory machines with low OLTP requirements, this 
was the correct choice until ballooning memories made the reverse true).


Because  PostgreSQL has used the same kind of model as Oracle -- and for a 
very  long time -- it has always been relatively strong at OLTP 
throughput.   Until SQL Server 2005, the Microsoft offering was never 
really  competitive.


Bull.  For anything except the heaviest OLTP loads, Microsoft was more than 
adequate.  You don't need a semi to drive the highways.


It had little to do with development timelines.  On the  other hand, 
PostgreSQL was a bit of a dog at OLAP until relatively  recently.


See?  You're making my point.:-)

You imply that the performance is due to some kind of linear  development 
path, but in fact SQL Server 2005 changed its internal  model to be like 
Oracle and PostgreSQL so that it could be competitive  at OLTP.  It is a 
matter of algorithm selection and tradeoffs, not  engineering effort.  SQL 
Server (until two years ago) has always had  relatively poor lock 
concurrency, but gave very good baseline OLAP  performance as a 
consequence of that decision.  The reality is that it  is much easier to 
make the Oracle/Postgres model perform  satisfactorily at OLAP than to 
make the old SQL Server model perform  satisfactorily at OLTP.


Again, you're making my point.  Until memory became cheap and OLTP become 
more critical, Microsoft made the right choice of OLAP over OLTP.  When the 
world changed, so did they.  I'd call that a strength and flexibility, not a 
weakness.


I've worked with very large databases on several major platforms, 
including Oracle and SQL Server in many different guises.  Oracle's 
parallel implementation may not distribute that well, but that is  because 
traditional transactional semantics are *theoretically  incapable* of 
distributing well.  To the extent it is possible at all,  Oracle does a 
very good job at making it work.


So, is your claim that Oracle distributes better than Microsoft?  If so, 
why?


There are new transactional architectures in academia that should work 
better in a modern distributed environment than any of the current 
commercial adaptations of classical architectures to distributed 
environments.


And PostgreSQL will probably implement them long before Oracle or MS.

Sun Microsystems not only officially supports it, they do a lot of 
development on it, as does Fujitsu in Asia, Red Hat and a few other  large 
companies that are heavily invested in it.  A significant  portion of the 
main PostgreSQL developers do it as their official  corporate job.


Cool.  I wasn't aware that it had made that many inroads.  Awesome.


PostgreSQL is very broadly ANSI compatible (including a lot of  ancillary 
database standards surrounding SQL), and to the extent it  has a flavor 
it clearly borrows from Oracle rather than SQL Server.   SQL Server has a 
lot of bits that do not conform to standards that  everyone else supports. 
From a historical perspective, PostgreSQL  shares a transaction model with 
Oracle, started on Unix, and has been  around since a time when SQL Server 
was not something you would want  to emulate.  PostgreSQL has matured to 
the point where it mostly  follows standards to the extent possible but 
has enough unique  features and capabilities that it has started to become 
a flavor of  its own.



If you could swap out an MS-SQL server *immediately* for a  PostgreSQL 
server simply by copy the data and rebinding a WINS name  or an IP 
address, I would be in hog heaven even if support wasn't  absolutely 
guaranteed since I could always switch back. Given that  there's a huge 
transition cost (changing scripts, procedures, etc.),  I can't get *ANY* 
agreement for the thought of switching (and I'm  sure that there are 
*MANY* more in my circumstances).



The only corporate database that relatively easily ports back and  forth 
with PostgreSQL is Oracle. Nonetheless, a number of people have  ported 
applications to PostgreSQL from MS-SQL with good results;  questions about 
porting nuances come up regularly on the PostgreSQL  mailing lists.


Beyond your basic ANSI compliance, database portability only sort of 
exists.  Inevitably people use non-standard platform features that  expose 
the specific capabilities of the engine being used to maximize 
performance.  As a practical matter, you pick a database platform and 
stick with it as long as is reasonably possible.




Re: [agi] Comments from a lurker...

2008-04-17 Thread Steve Richfield
Mark,

On 4/16/08, Mark Waser [EMAIL PROTECTED] wrote:

   True, but this is inherent with ALL less than perfectly understood
 systems and is not in any way peculiar to Dr. Eliza. Extrapolations are
 inherently hazardous, sometimes without reasonable limit.

 Correct.  Part of the point to AGI is to automatically create knowledge
 bases that are as complete as possible.  Dr. Eliza seems to be a reasonable
 attempt to use a small amount of cherry-picked knowledge to solve a wide but
 not complete range of unsolved problems of a given type -- and has all of
 the standard inherent advantages and disadvantages of that approach.
 Wouldn't you agree?


Yes.


  There were a bunch of them and I don't claim to be a historian. As I
 understood those methods they used two kinds of expertise - one of which was
 similar to the symptoms and conditions that I use, and another that guided
 the repair process. Dr. Eliza does without the guidance. This has the
 advantage that it works with inept experts, and the disadvantage that it can
 be less efficient than if it had good guidance. I had to find a grand
 heuristic to replace expert-entered probabilities and the rest of that
 guidance. After lots of experimenting, that grand heuristic turned out to be
 incredibly simple, buried in the symptom weighting for various conditions,
 being that you count the first potential symptom (or its verified absence)
 as 80%, the next one as 80% * 20% = 16%, the third as 80% * 4% = 3%, etc.
 This gives a lot of weighting to the leading symptoms, but nonetheless
 seemed to work well.

 Wow!  That's a *really* wicked tail-off.  Seems really counter-intuitive.


Yes - it surprised me too, and it took a bunch of effort for me to get a
good handle on why it worked, because I REALLY don't like my software to
depend on things that I don't understand. It comes from Shannon's
information theory. The amount of information in a datum is most dependent
on the attendant noise. If you had a perfect symptom that exactly tracked
a cause-and-effect chain link, then you would do best to ignore all other
symptoms, regardless of whether they supported or contradicted the perfect
symptom. In our less-than-perfect world, the list of potentially useful
symptoms is usually short, and the noise comes from other cause-and-effect
chain links that may exhibit substantially identical symptoms. If you have
two symptoms, one with high noise and one with low noise, you do best by
substantially ignoring the noisy symptom. The key to separating links using
noisy symptoms is to use more than noisy symptom that hopefully has
uncoupled noise. When your knowledge composer KNOWS about the 80% roll-off,
then they CAREFULLY select which symptoms to use and which to ignore, for a
secondary human effect of keeping the knowledge composer from throwing in
everything but the kitchen sink along with the dirty wash water.

Note further that unmentioned symptoms are NOT significantly considered in
computing the result, only those that are affirmed or denied. This means
that if ONLY the third symptom in the list that would only have a 3% effect
if among others, has a 100% effect if it is alone. This results in noisy
results - Dr. Eliza reports 100% interim probability, but fails to mention
the 50% noise factor, and continues to press the user to answer questions
about the two symptoms that precede the 3% symptom that is currently driving
everything. Note also that the 3% symptom is probably also driving other
potential conditions where it may be earlier in the list, and those
conditions may are also be inserting their own questions. To separate the
various 100%s in interim results, I added a heuristic to slightly reduce the
100% results proportionately to how far down the list that the first
confirmed/denied symptom is.

In typical use, there are often as many negative results (from denied
symptoms) than positive results! What could a negative probability possibly
mean? Not only do we have no believable evidence of the associated
condition, but if natural forces were to try to force it, that those forces
would probably fail approximately the indicated percentage of the time.


  I'm not sure what you mean by guided the repair process


Where the expert's model of a decision tree, questioning, significance of
symptoms, etc., is used instead of the engine's own generated one that may
annoy the knowledge composer. It is interesting to watch others composing
for Dr. Eliza, because they have their own ideas how to proceed in the
presence of certain symptoms that may be of wide variance to Dr. Eliza's
approach. So far, discussing this with them at length has yielded that that
there really isn't any good reason for doing it their way, and by letting
Dr. Eliza do its own thing, that inputting is a LOT easier. Note that there
are NO expert-entered percentages in the Knowledge.mdb, which seems to
result in BETTER operation because experts almost as often lead things
astray with myths as guide 

Re: [agi] associative processing

2008-04-17 Thread Steve Richfield
Derek

On 4/16/08, Derek Zahn [EMAIL PROTECTED] wrote:

 Steve Richfield, writing about J Storrs Hall:

  You sound like the sort that once the things is sort of
  roughed out, likes to polish it up and make it as good as possible.

 I don't believe your characterization is accurate.  You could start with
 this well-done book to check that opinion:

 http://www.amazon.com/Beyond-AI-Creating-Conscience-Machine/dp/1591025117


Very interesting.


 Because you are new to the discussion here you probably don't quite get
 the topic of this mailing list (AGI);


I think that I do - see comments after addressing your other comments.

 the system sort-of described in your papers


I described TWO systems. The one in this thread I specifically designed with
a mind to eventually emulate YOU, neuron-by-neuron, synapse-by-synapse, in
real time.

The one mentioned in my Comments from a lurker thread mentions Dr. Eliza,
that is designed to solve difficult problems in simple ways that billions of
people have missed for a million years, and very likely ANY
astronomically-sized AGI machine would miss for centuries. It was unclear
how AGI was supposed to quickly do something that was only possible after
10E14 human years of wars and other strife, without having to go through,
and even potentially cause the same.

Proof by example to me, but apparently still not yet to the remainder of
this group, is that there ARE really important things that can only be
solved inductively, and that socialized AGI-like humans have SO little
inductive abilities that even relatively simple concepts have simply escaped
human capabilities for a thousand millennia. I clearly understand this
because my own native inductive abilities are also in short supply. I had to
hang on by my fingernails just to get through differential equations. I
eventually developed my own assortment of mental crutches to survive my
shortfall in native inductive ability, which were subsequently expanded upon
to form Dr. Eliza's concept and innards.

 does not address any of the issues of that topic (as defined in its core
 publications and conferences) so don't be too surprised if people here are
 not particularly excited about it.


Hmm, I haven't seen a reference to those core publications. Is there a
semi-official list?

Much of what is presently known about human neuro-anatomy comes to people
from the writings of Dr. William Calvin. I was his assistant at the U of W
Department of Neurological Surgery. That was AFTER I had performed one of
the first neurological simulations and the first known to have categorized
inputs via unsupervised learning. We held each other's feet to the fire, Me
for being wet-science correct, and Calvin for models that performed
good-math computations. Everyone knows about synapses performing weighted
accumulations, but few people know that many/most integrate and
differentiate, and that inhibitory synapses are typically VERY non-linear
with some VERY interesting transfer function, etc.

I published a paper at the first IJCNN in San Diego explaining how
everything pointed to wet neurons generally computing with the logarithms of
probabilities of assertions being true. That simple fact should have guided
future research, but lab researchers not being mathematicians, and neither
going to NN conferences, this guiding fact as died away like the echo of
some long-forgotten noise. When a tree falls in the forest...

My son has beliefs that closely match those expressed by others on this
forum, and we sometimes have long arguments about what is and is not
reasonable for a human scale neural simulation program - beyond more
all-too-human stupidity.

My son has also developed the best known (and acknowledged as such at an
unrelated WORLDCOMP presentation) general purpose neural net simulation
program that runs on a PC, that is at once fast, flexible, and
well-instrumented. It has good-looking graphics (that look like contemporary
test instruments with fantastic abilities) and is able to stick its
tentacles deeply into other applications (like flight simulator) to provide
interactive input. I give him all the support that I can, but I still
question where this is all going. His program is (presently) written in
VB.net, converted from its earlier VB.

My own personal interest is in living forever, but regardless of how
expanded my brain might become, I suspect that I will STILL have the
shortcomings that this sort of architecture brings with it, scary though
that might be. THAT was part of my motivation for designing Dr. Eliza, which
(it appears to me) could quickly (like in a year of adequate funding) grow
beyond any AGI's future problem-solving abilities. It may take the likes of
an evolved Dr. Eliza to provide the problem solving ability needed to design
the AGI that people are discussing here.

Steve Richfield

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: 

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 12:20 PM, Mark Waser wrote:
It has always been posssible to tweak any of the databases to the  
other's transactional model.



Eh? Choices in concurrency control and scheduling run very deep in a  
database engine, with ramifications that cascade through every other  
part of the system.  Equivalent transaction isolation levels can  
behave very different in practice depending on the internal  
transaction representation and management model.  You cannot turn off  
these side-effects, and you cannot tweak a non-MVCC-ish model to  
behave like an MVCC-ish model at runtime in any way that matters.



Second of all, it was not a weakness -- it was a deliberate choice  
of optimization -- it was a choice of OLAP over OLTP (and, let's be  
honest, for most databases on limited memory machines with low OLTP  
requirements, this was the correct choice until ballooning memories  
made the reverse true).



The rise of the Internet, with its massive OLTP load characteristic,  
kind of settled the issue.  It is true though that Oracle-like OLTP  
monsters have significantly higher resource overhead for storing the  
same set of records.  These days it is concurrency bottlenecks that  
will kill you.



So, is your claim that Oracle distributes better than Microsoft?  If  
so, why?



Very mature implementation of the concepts, and almost every  
conceivable mechanism and model for doing it is hidden under the  
hood.  Remember, they started introducing the relevant concepts ages  
ago in Oracle 7, though in practice it was mostly unusable until  
relatively recently.   Consequently, their implementation is easily  
the most general in that it works moderately well across the broadest  
number of use cases because they've been tweaking that aspect for  
years.  Other commercial implementations tend to only work for a much  
narrower set of use cases.  In short, Oracle has a long head start.



There are new transactional architectures in academia that should  
work better in a modern distributed environment than any of the  
current commercial adaptations of classical architectures to  
distributed environments.


And PostgreSQL will probably implement them long before Oracle or MS.



Ironically, a specific design decision that has created a fair amount  
of argument for years makes PostgreSQL the engine starting from the  
closest design point.  PostgreSQL does not support threading and only  
uses a single process per query execution, originally for portability  
and data safety reasons -- the extreme hackability would be difficult  
to do otherwise.  This made certain types of trivial parallelism for  
OLAP difficult.  On the other hand, it has had distributed lock  
functionality for a number of versions now.


If you look at newer models explicitly designed to make transactional  
database scale better across distributed systems, you find that they  
are built on a design requirement of single processes per resource,  
strict access serialization, no local parallelism, and distributed  
locks.  Which is not that far removed from where PostgreSQL is today,  
if you remove massive local concurrency support and its high overhead.  
There are a number of outfits (see www.greenplum.com for a very  
advanced implementation) that have hacked PostgreSQL to scale across  
very large clusters for OLAP by essentially making the necessary  
tweaks to approximate these types of models.  The next step would be  
to rip out a lot of expensive bits based on classical design  
assumptions that make distributed write loads scale poorly.


In a sense, a design choice that has traditionally put some limits on  
scaling PostgreSQL for OLAP put it in exactly the right place to make  
implementation of next-generation architectures as natural of an  
evolution as can be expected in this case.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 12:26 PM, Mark Waser wrote:
Actually, it's far worse than that.  For serious systems, most of  
the heavy lifting is done inside the database with stored procedures  
which are not standard AT ALL.  SQL is reasonably easy to port.   
Stored procedures that do a lot of work are not.



The standard is SQL/PSM, which looks similar to Oracle's PL/SQL (and  
PostgreSQL's pl/pgsql).  As a practical matter, support is not  
consistent enough or widespread enough for it to be entirely usable  
for purposes of portability though it is getting better.


To be fair, full SQL/PSM support will not be core in PostgreSQL until  
the next release.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


RE: [agi] associative processing

2008-04-17 Thread Derek Zahn
Steve Richfield writes:
 
 Hmm, I haven't seen a reference to those core publications. Is there a 
 semi-official list?
 
This list is maintained by the Artificial General Intelligence Research 
Instutute.  See www.agiri.org .  On that site there are several semi-official 
lists -- under Publications and Instead of an AGI Textbook.
 
Certainly there is very little agreement (on anything!) amongst the 
idiosyncratic group of people who post on this list and I did not intend to 
dissuade you from presenting your ideas (which I have found interesting so far, 
in proportion to the degree they address AGI topics); I was just explaining why 
people here are unlikely to find Dr. Eliza to be particularly interesting.
 
 

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] Sending attachments to the list

2008-04-17 Thread Steve Richfield
Richard,

I presume that you were referring to (worst offender) ME here!

On 4/16/08, Richard Loosemore [EMAIL PROTECTED] wrote:

 Just a quick reminder about list protocol:  if you want to send someone a
 document (especially a pdf), please remember to send it to their personal
 email address, rather than send it to the entire list.


One such PDF ended up connecting with Josh who is REALLY into hardware
design, and a really interesting thread developed that, who knows, may lead
to the magic chip needed to implement AGI. This alone may be worth the
overhead!

However, I DID send a bunch of stuff before I realized that it was going to
the entire list - sorry about that. I will limit myself to at most one
CAREFULLY-chosen PDF per posting in the future.

I propose that someone move this list to Yahoo, that provides storage space,
along with many other useful tools, like surveys.

Or, better yet, make it available on a website.

 Some of us still collect their mail on a low bandwidth connection
 sometimes,


Including me.

and it can be hell to wait 20 minutes just to check your mail.


Not with Gmail and other web-based services. They keep the attachments on
their servers until they are clicked on. Who the heck uses POP3 with dialup?

Steve Richfield


---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


RE: [agi] associative processing

2008-04-17 Thread Derek Zahn
Note that the Instead of an AGI Textbook section is hardly fleshed out at all 
at this point, but it does link to a more-complete similar effort to be found 
here:
 
http://nars.wang.googlepages.com/wang.AGI-Curriculum.html

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed
YKY,

I agree with your side of the debate about whole KB not fitting into RAM.  As a 
solution, I propose to partition the whole KB into the tiniest possible cached 
chunks, suitable for a single agent running on a host computer with RAM 
resources of at least one GB.  And I propose that AGI will consist not of one 
program running on one computer, but a vast multitude of separately hosted 
agents working in concert.

But my opinion of the OpenCyc concept coverage with respect to that of a human 
five-year old differs greatly from yours.  I concede that 20 OpenCyc facts 
are about the number a child might know, but in order to properly ground these 
concepts, I believe that a much larger number of feature vectors will have to 
be stored or available in abstracted form.   For example, there is the concept 
of the child's mother.  Properly grounding that one concept might require 
abstracting features from thousands of observations:
wet hair motherfar away motherangy mothermother hidden from viewmother in a 
crowdmother's voicemother in dim lightmother from belowand so on
Of course you can ignore fully grounded concepts as does current Cycorp for its 
applications, and as I will with Texai until it is past the bootstrap stage.

-Steve


Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860

- Original Message 
From: YKY (Yan King Yin) [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 3:58:43 PM
Subject: Re: [agi] database access fast enough?

 On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:
   Yes.  RAM is *HUGE*.  Intelligence is *NOT*.
 
  Really?  I will believe that if I see more evidence... right now I'm
 skeptical.

 And your *opinion* has what basis?  Are you arguing that RAM isn't huge?
 That's easily disprovable.  Or are you arguing that intelligence is huge?
 That too is easily disprovable.  Which one do I need to knock down?

The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).

The RAM size of current high-end PCs is ~10 Gbs.

My intuition estimates that the current OpenCyc is only about 10%-40%
of a 5 year-old human intelligence.

Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.

That comes to 500Gb - 20Tb.

It seems that if we allow several years for RAM size to double a few
times, RAM may have a chance to catch up to the low end.  Obviously
not now.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?;
Powered by Listbox: http://www.listbox.com







  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Computational requirements of AGI (Re: [agi] database access fast enough?)

2008-04-17 Thread Matt Mahoney
--- Steve Richfield [EMAIL PROTECTED] wrote:
 On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:
 
  That's true as of now, but let's think one or two steps further:  Do
   you really think a mature AGI's (say with 3-6 year-old human
   intelligence) KB can reside in RAM, entirely?
  
 
  Yes.  RAM is *HUGE*.  Intelligence is *NOT*.
 
 
 Hmm, thinking on the keyboard...
 ~100E9 computing cells with ~50K inputs each, of which ~200 are active.
 One theory is that you would only have to carry the active inputs, plus some
 fraction of the inactive inputs while you watched for things to happen to
 make them active. Let's say that we must track ~1E3 inputs, for a total
 of 100E12 or one hundred trillion inputs. We could use fractal means to
 generate the original configuration (as biological brains probably do), very
 low precision arithmetic with statistical rounding, etc., which would reduce
 each input to just a few bytes to maintain, say ~10. This makes a total of
 1E15 or one quadrillion bytes to represent a simulated human's instantaneous
 state of construction. An entire checkpoint would take little more, because
 it would only include in addition the electrical state of each of the 100E9
 cells.
 
 Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate
 because 4/5 of the represented inputs are presently inactive, for a total of
 only 100 terabytes.
 
 Note that ~90% of those 100E9 cells are slow-responding glial cells, so
 while the state is large, the computational requirements may be well short
 of a petaflop.
 
 Of course, this makes a LOT of assumptions that no one has yet bothered to
 confirm in the laboratory, and I do NOT want to ignite an estimates war,
 so I invite constructive comments from anyone with more recent data than I
 have.

The Blue Brain project estimates 8000 synapses per neuron in mouse cortex.  I
haven't seen a more accurate estimate for humans, so your numbers are probably
as good as mine.  I estimate 10^11 neurons, 10^15 synapses (1 bit each) and a
response time of 100 ms, or 10^16 OPS to replicate the processing of a human
brain.

The memory requirement is considerably higher than the information content of
long term memory estimated by Landauer [1], about 10^9 bits.  This may be due
to the constraints of slow neurons, parallelism, and the pulsed binary nature
of nerve transmission.  For example, the lower levels of visual processing in
the brain involve massive replication of nearly identical spot filters which
could be simulated in a machine by scanning a small filter coefficient array
across the retina.  It also takes large numbers of nerves to represent a
continuous signal with any accuracy, e.g. fine motor control or distinguishing
nearly identical perceptions.

However my work with text compression suggests that the cost of modeling 1 GB
of text (about one human lifetime's worth) is considerably more than a few GB
of memory.  My guess is at least 10^12 bits just for ungrounded language
modeling.  If the model is represented as a set of (sparse) graphs, matrices,
or neural networks, that's about 10^13 OPS.

Remember that the goal of AGI is not to duplicate the human brain, but to do
the work that humans are now paid to do.  It still requires solving hard
problems like language, vision, and robotics, which consume a significant
fraction of the brain's computing power.  But what matters is that the cost of
AGI be less than human labor, currently US $10K per year worldwide and growing
at 3-4% (5% GDP growth - 1.5% population growth).  If my guess is right and
Moore's law continues (halving costs every 1.5 to 2 years), then AGI is at
least 10-15 years away.  If it actually turns out there are no shortcuts to
simulating the brain, then it is 30 years away.

1. Landauer, Tom, How much do people remember? Some estimates of the quantity
of learned information in long term memory, Cognitive Science (10) pp.
477-493, 1986.



-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] associative processing

2008-04-17 Thread Matt Mahoney
--- Steve Richfield [EMAIL PROTECTED] wrote:
 The one mentioned in my Comments from a lurker thread mentions Dr. Eliza,
 that is designed to solve difficult problems in simple ways that billions of
 people have missed for a million years, and very likely ANY
 astronomically-sized AGI machine would miss for centuries. It was unclear
 how AGI was supposed to quickly do something that was only possible after
 10E14 human years of wars and other strife, without having to go through,
 and even potentially cause the same.

As far as I can tell, it only gives medical advice based on your personal
agenda.  It knows only what you program into it.

 I published a paper at the first IJCNN in San Diego explaining how
 everything pointed to wet neurons generally computing with the logarithms of
 probabilities of assertions being true. That simple fact should have guided
 future research, but lab researchers not being mathematicians, and neither
 going to NN conferences, this guiding fact as died away like the echo of
 some long-forgotten noise. When a tree falls in the forest...

I use the same technique in my PAQ7/8 data compressors (since Dec. 2005),
although I was not aware of your research.  A set of models independently
estimate the probability p(0), p(1) that the next bit of input will be a 0 or
1 based on past history in various contexts.  The predictions are mapped to x
= log(p(1)/p(0)), combined by weighted averaging, then mapped by the inverse
squashing function 1/(1+exp(-x)), which makes it a neural network.  Then the
weights are adjusted to favor the most accurate predictions in proportion to
x*(actual - predicted), a simplification of back propagation that minimizes
coding cost rather than RMS prediction error.

I should mention the technique works quite well.
http://www.maximumcompression.com/data/summary_sf.php


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)
On 4/18/08, Stephen Reed [EMAIL PROTECTED] wrote:

 I agree with your side of the debate about whole KB not fitting into RAM.  As 
 a solution, I propose to partition the whole KB into the tiniest possible 
 cached chunks, suitable for a single agent running on a host computer with 
 RAM resources of at least one GB.  And I propose that AGI will consist not of 
 one program running on one computer, but a vast multitude of separately 
 hosted agents working in concert.


Disk access rate is ~10 times faster than ethernet access rate.  IMO,
if RAM is not enough the next thing to turn to should be the harddisk.

Distributive AGI is a fascinating idea, but you have to solve a lot of
algorithmic problems to make it work.  If each agent has only a slice
of the full KB, the average commonsense query would require
cooperation among many agents.  That's a very challenging algorithmic
problem.  I'm content to do simple, single-machine AGI.


 But my opinion of the OpenCyc concept coverage with respect to that of a 
 human five-year old differs greatly from yours.  I concede that 20 
 OpenCyc facts are about the number a child might know, but in order to 
 properly ground these concepts, I believe that a much larger number of 
 feature vectors will have to be stored or available in abstracted form.   For 
 example, there is the concept of the child's mother.  Properly grounding that 
 one concept might require abstracting features from thousands of observations:

=
Yes, I actually agree with you -- I subconsciously tuned down my
estimates as I was talking to Mark =)

I think sensory processing is going to be a very hard problem, so we
should postpone sensory grounding as late as possible, and instead
focus on text.

Don't forget that the AGI needs to have *episodic* memory as well.  If
we include that, secondary storage is certainly needed.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers


On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote:

Disk access rate is ~10 times faster than ethernet access rate.  IMO,
if RAM is not enough the next thing to turn to should be the harddisk.



Eh?  Ethernet latency is sub-millisecond, and in a highly tuned system  
approaches the 10 microsecond range for something local.  Much, much  
faster than disk if the remote node has your data in RAM and is  
relatively local.


Note that relatively local can mean geographically regional.  The  
round-trip RAM access time from my machine to a machine on the other  
side of town is a fraction of millisecond over the Internet connection  
(not hypothetical, actually measured at ~400 microseconds).  I wish  
disk access was even remotely that good.  And this was with  
inexpensive Gigabit Ethernet.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


[agi] Rationalism and Empricial Rationalism

2008-04-17 Thread Jim Bromer
From: Mark Waser [EMAIL PROTECTED]
Subject: Re: [agi] Rationalism and Scientific Rationalism: Was Logical
Satisfiability...Get used to it.

It looks as if you're saying that scientific rationalism must be grounded
but that rationalism in general need not be. Is this a correct
interpretation?
-

No, yes and I'm not sure.
I would like to write a message about an artificial rationalism and an
artificial empirical rationalism.  I am not going to try to write
about an AI architecture, but I do want to write in terms that can
lend themselves to a discussion of how rationalism and empirical
rationalism can be designed into an AGI program.  However, this is not
meant as a definitive statement on the various ways that the words and
concepts behind the words 'rationalism' and 'empiricism' are used.
(The phrase empirical rationalism is probably a better term for me to
use then scientific rationalism.)
But yes, in general, I feel that scientific rationalism and empirical
rationalism have to be more grounded than simple rationalism,
especially when we are trying to understand how these concepts can be
applied to an advanced AGI program.
But on the other hand, the concept of grounding may be too strong a
term.  Think of an AGI program that can learn from a natural language
text-based IO but does not have any other kind of IO.  I would argue
that there has to be a distinction between the definition of
rationalism (using some kind of applied logic-based systems) and
empirical rationalism (which also has some kind of experimental way of
grounding ideas and conjectures, and some kind of conceptual
integration as well).  The problem with this example however, is that
the same conceptual functions are being used to devise conjectures
about the IO data environment as are used to test those conjectures.
So there is a real question about the depth of the 'grounding' since
the problem is so obviously tricky.  It is my belief that while the
concept of grounding is important for advanced AGI, it is itself no
more solid a premise than the other concepts used in AGI.  But I do
believe that some kind of 'grounding' is absolutely necessary for it.
Jim Bromer

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


RE: [agi] database access fast enough?

2008-04-17 Thread Gary Miller
YKY Said:

The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).
The RAM size of current high-end PCs is ~10 Gbs.
My intuition estimates that the current OpenCyc is only about 10%-40% of a
5 year-old human intelligence.
Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.
That comes to 500Gb - 20Tb.
It seems that if we allow several years for RAM size to double a few
times, 
 RAM may have a chance to catch up to the low end.  Obviously not now.

Don't forget about solid state hard drives (SSDs).  

Currently Solid State Drives speed up typical database applications by about
30 times.

And that's without stripping out all the old caching overhead code databases
used for handling the order of magnitude speed differences between RAM and
hard drives.

Large Storage Area Network Vendors like EMC are looking to SSD Drives to
eliminate IO bottlenecks in corporate applications where large
datawarehouses reach 20Tb very quickly.

And look for capacity to continue to double about every 18 months driving
the price down very quickly.  

And due to higher reliability and lower energy costs to run it won't be too
long before hard drive join the ranks 
of 8-track tape players, record players and 5 1/4 diskettes. 

http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1300939,00.html#

http://www.storagesearch.com/ssd-fastest.html


---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com


Re: [agi] An Open Letter to AGI Investors

2008-04-17 Thread Benjamin Johnston


I have stuck my neck out and written an Open Letter to AGI (Artificial 
General Intelligence) Investors on my website at http://susaro.com.


All part of a campaign to get this field jumpstarted.

Next week I am going to put up a road map for my own development project.



Hi Richard,

If I were a potential investor, I don't think I'd find your letter 
convincing.


AI was first coined some 50 years ago: before I was born, and therefore 
long before I entered the field of AI. Naturally, I can't speak with 
personal experience on the matter, but when I read the early literature 
on AI or when I read about field's pioneers reminiscing on the early 
days, I get the distinct impression that this was an incredibly 
passionate and excited group. I would feel comfortable calling them a 
gang of hot-headed revolutionaries - even today, 50 years after 
inventing the term AI and at the age of 80, McCarthy writes about AI 
and the possibility of strong AI with passion and excitement. Yet, in 
spite of all the hype, excitement and investment that was apparently 
around during that time (or, more likely, as a result of the hype and 
excitement) the field crashed in the AI winter of the 80s without 
finding that dramatic breakthrough.


There's the Japanese Fifth Generation Computer Systems project that I 
understand to be a massive billion dollar investment during the 80s into 
parallel machines and artificial intelligence; an investment that is 
today largely considered to be a huge failure.


And of course, there's Cyc; formed with an inspiring aim to capture all 
commonsense knowledge, but still remains in development some 20 years later.


And in addition to these, there are the many many early research papers 
on AI problem solving systems that show early promise and cause the 
authors to make wild predictions and claims in their Future Work... 
predictions that time has reliably proven to be false.


So, why would I want to invest now? When I track down the biographies of 
several of the regulars on this list, I find that they entered the field 
during or after the AI Winter and never experienced the early optimism 
as an insider. How can you convince an investor that the passion today 
isn't just the unfounded optimism of researchers who don't remember the 
past? How can you convince an investor that AGI isn't also going to 
devolve again into an emphasis on publications rather than quality (as 
you claim AI has devolved) or into a new kind of weak AGI with no 
dramatic breakthrough?


I think a better argument would be to point to a fundamental 
technological or methodological change that makes AGI finally credible. 
I'm not convinced that being lean, mean, hungry and hellbent on getting 
results is enough. If I believe in AGI, maybe my best bet is to invest 
my money elsewhere and wait until the fundamental attitudes have changed 
so each dollar will have a bigger impact, rather than squandered on a 
bad dead-end idea. Alternately, my best bet may be to invest in weak AI 
because it will give me a short-term profit (that can be reinvested) AND 
has a plausible case for eventually developing into strong AI. If you 
can offer no good reason to invest in AGI today (given all its past 
failures), aside from a renewed passion of its researchers, then a sane 
reader would have to conclude that AGI is probably a bad investment.



Personally, I'm not sure what I feel about AGI (though, I wouldn't be 
here if I didn't think it was valuable and promising). However, in this 
email I'm trying to play the devil's advocate in response to your open 
letter to investors.


-Ben

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com