Re: [agi] database access fast enough?
On Apr 16, 2008, at 9:51 PM, YKY (Yan King Yin) wrote: Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? No, you are not correct about this. All good database engines use a combination of clever adaptive cache replacement algorithms (read: keeps stuff you are most likely to access next in RAM) and cost-based optimization (read: optimizes performance by adaptively selecting query execution algorithms based on measured resource access costs) to optimize performance across a broad range of use cases. For highly regular access patterns (read: similar query types and complexity), the engine will converge on very efficient access patterns and resource management that match this usage. For irregular access patterns, it will attempt to dynamically select the best options given recent access history and resource cost statistics -- not always the best result (on occasion hand optimization could do better), but more likely to produce good results than simpler rule-based optimization on average. Note that by good database engine I am talking engines that actually support these kinds of tightly integrated and adaptive management features: Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a naive and relatively non-adaptive engine, and which scales much worse and is generally slower than PostgreSQL anyway if you are looking for a free open source solution. I would also point out that different engines are optimized for different use cases. For example, while Oracle and PostgreSQL share the same transaction model, Oracle design decisions optimized for massive numbers of small concurrent update transactions and PostgreSQL design decisions optimized for massive numbers of small concurrent insert/delete transaction. Databases based on other transaction models, such as IBM's DB2, sacrifice extreme write concurrency for superior read-only performance. There are unavoidable tradeoffs with such things, so the market has a diverse ecology of engines that have chosen a different set of tradeoffs and buyers should be aware of what these tradeoffs are if scalable performance is a criteria. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
IMHO : The stated expected benefit of AGI development is overly ambitious on the sciencetechnology side and not ambitious enough on the socialeconomy side. For AGI to become the Next Big Thing it does not really have to come up with the best medical researcher. Nor would a great medical researcher have as much impact on the way current civilization works as replacement of human workers in the sector of services. Impact of previous technology revolutions can be described in a very fundamental way as freeing (liberating? discharging?) people from engagement in hunting and similar, then agriculture and similar, then industry and similar. Well, industry in still in the working and AGI could help there too but the direction is clear. Services are next area of human socialeconomic activity to benefit and suffer at same scale as others did earlier from technology. This is the most obvious general social role and selling point of AGI at least until/unless it becomes true deux ex machina ;). To liberate (but also : discharge, which is going to be a huge adoption/penetration problem) humans from engagement in providing economically significant services to other humans. What such roles and how does AGI address/fulfill should be the key metric if it is to be sold outside a community which is motivated by the intellectual challenge alone. So IMHO if you want to sell AGI to investors you better start with replacing travel agents, brokers, receptionists, personal assistants etc. etc. rather than researchers. Regards Nikolay Richard Loosemore wrote: I have stuck my neck out and written an Open Letter to AGI (Artificial General Intelligence) Investors on my website at http://susaro.com. All part of a campaign to get this field jumpstarted. Next week I am going to put up a road map for my own development project. Richard Loosemore --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- *Nikolay Ognyanov, PhD* Chief Technology Officer *TravelStoreMaker.com Inc.* http://www.travelstoremaker.com/ Phone: +359 2 933 3832 Fax: +359 2 983 6475 --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
Nikolay Ognyanov wrote: IMHO : The stated expected benefit of AGI development is overly ambitious on the sciencetechnology side and not ambitious enough on the socialeconomy side. For AGI to become the Next Big Thing it does not really have to come up with the best medical researcher. Nor would a great medical researcher have as much impact on the way current civilization works as replacement of human workers in the sector of services. Impact of previous technology revolutions can be described in a very fundamental way as freeing (liberating? discharging?) people from engagement in hunting and similar, then agriculture and similar, then industry and similar. Well, industry in still in the working and AGI could help there too but the direction is clear. Services are next area of human socialeconomic activity to benefit and suffer at same scale as others did earlier from technology. This is the most obvious general social role and selling point of AGI at least until/unless it becomes true deux ex machina ;). To liberate (but also : discharge, which is going to be a huge adoption/penetration problem) humans from engagement in providing economically significant services to other humans. What such roles and how does AGI address/fulfill should be the key metric if it is to be sold outside a community which is motivated by the intellectual challenge alone. So IMHO if you want to sell AGI to investors you better start with replacing travel agents, brokers, receptionists, personal assistants etc. etc. rather than researchers. I'm sorry, but this makes no sense at all: this is a complete negation of what AGI means. If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher who could learn to be the world's leading specialist in a particular field, and then be duplicated so that you instantly had 1,000 world-class specialists in that field. 3) An expert in AGI system design, who could then design a faster generation of AGI systems, so that, as a researcher in any scientific field, these second-generation systems could generate new knowledge faster than all the human scientists and engineers on the planet. ? To say to an investor that AGI would be useful because we could use them to build travel agents and receptionists is to utter something completely incoherent. This is the Everything Just The Same, But With Robots fallacy. Richard Loosemore --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: No, you are not correct about this. All good database engines use a combination of clever adaptive cache replacement algorithms (read: keeps stuff you are most likely to access next in RAM) and cost-based optimization (read: optimizes performance by adaptively selecting query execution algorithms based on measured resource access costs) to optimize performance across a broad range of use cases. For highly regular access patterns (read: similar query types and complexity), the engine will converge on very efficient access patterns and resource management that match this usage. For irregular access patterns, it will attempt to dynamically select the best options given recent access history and resource cost statistics -- not always the best result (on occasion hand optimization could do better), but more likely to produce good results than simpler rule-based optimization on average. Note that by good database engine I am talking engines that actually support these kinds of tightly integrated and adaptive management features: Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a naive and relatively non-adaptive engine, and which scales much worse and is generally slower than PostgreSQL anyway if you are looking for a free open source solution. I would also point out that different engines are optimized for different use cases. For example, while Oracle and PostgreSQL share the same transaction model, Oracle design decisions optimized for massive numbers of small concurrent update transactions and PostgreSQL design decisions optimized for massive numbers of small concurrent insert/delete transaction. Databases based on other transaction models, such as IBM's DB2, sacrifice extreme write concurrency for superior read-only performance. There are unavoidable tradeoffs with such things, so the market has a diverse ecology of engines that have chosen a different set of tradeoffs and buyers should be aware of what these tradeoffs are if scalable performance is a criteria. Thanks for the info -- I studied database systems almost a decade ago, so I can hardly remember the details =) ARC (Adaptive Cache Replacement) seems to be one of the most popular methods, and it's based on keeping track of frequently used and recently used. Unfortunately, for AGI / inference purposes, those may not be the right optimization objectives. The requirement of inference is that we need to access a lot of *different* nodes, but the same nodes may not be required many times. Perhaps what we need is to *bundle* up nodes that are associated with each other, so we can read a whole block of nodes with 1 disk access. This requires a very special type of storage organization -- it seems that existing DBMSs don't have it =( YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote: If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher who could learn to be the world's leading specialist in a particular field,... Travel agent. Better yet, housemaid. I can teach it to become these things because I know how to do them. Early AGIs will be more likely to be successful at these things because they're easier to learn. This is sort of like Orville Wright asking, If I build a flying machine, what's the first use I'll put it to: 1) Carrying mail. 2) A manned moon landing. --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
To use an example, If a lot of people search for Harry Porter, then a conventional database system would make future retrieval of the Harry Porter node faster. But the requirement of the inference system is such that, if Harry Porter is fetched, then we would want *other* things that are associated with Harry Porter to be retrieved faster in the future, for example items such as JK Rowling or fantasy fiction. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
J Storrs Hall, PhD wrote: On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote: If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher who could learn to be the world's leading specialist in a particular field,... Travel agent. Better yet, housemaid. I can teach it to become these things because I know how to do them. Early AGIs will be more likely to be successful at these things because they're easier to learn. Yes, that shows deep analysis and insight into the problem. I can just see the first AGI corporation now, having spent a hundred million dollars in development money, deciding to make a profit by selling a housemaid robot that will replace the cheap, almost-slave labor coming across the border from Mexico. Of course, it would not occur to that company to develop their systems just a litle more and get the AGI to do high-value intellectual work. Richard Loosemore --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
No. You are not correct. Most DBMS's compile and optimize complex queries as a separate operation before doing data retrieval -- but even the most complex query is actually implemented as a series of simple retrievals (which is what the database is truly designed to do). On the other hand, communication to and from your database -- particularly across a network -- is very likely to be a speed problem. My solution is to actually implement your inference in the database engine. That way the database handles all of your memory management, caching, storage, etc., etc. - Original Message - From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 12:51 AM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
So IMHO if you want to sell AGI to investors you better start with replacing travel agents, brokers, receptionists, personal assistants etc. etc. rather than researchers. I'm sorry, but this makes no sense at all: this is a complete negation of what AGI means. Actually . . . . sorry, Richard . . . . but why does it matter what AGI means? You are trying to sell a product for money. Why do you insist on attempting to sell someone something that they don't want just because *you* believe that it's better than what they do want? Why not just sell them what they want (since they get it for free with what you want) and be happy that they're willing to fund you? If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher 3) An expert in AGI system design, 4) All of the above. But I'd just market it as a travel agent to the people who want a travel agent and a medical researcher to the drug companies (the AGI expert would have it figured out but would have no spare cash :-).. To say to an investor that AGI would be useful because we could use them to build travel agents and receptionists is to utter something completely incoherent. Not at all. It is catering to their desires and refraining from forcibly educating them. Where is the harm? It's certainly better than getting the door slammed in your face. This is the Everything Just The Same, But With Robots fallacy. No, it's not because you're not saying that everything is going to be the same. All you're saying is that travel agents *can* be replaced without insisting on pointing out that *EVERYTHING* is likely to be replaced. --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY Here is what I learned from implementing the Texai knowledge base. It persists symbolic statements about concepts. I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on SuSE 64-bit Linux. My Java application driving MySQL dramatically slowed down when the number of rows exceeded 20 million as compared to the initial load of 5 million rows. I then tried Oracle Berkeley DB Java Edition (open source) which provides no SQL query facility, instead one programs directly to its API for inserts, queries, updates and so forth. It is faster than MySQL for my large KB, but uses four times as much disk space due to its method of inserting new rows at the end of the file, and having lots of free space.I then studied partitioning, which means to break up the monolithic KB into smaller databases in which accesses are expected to be clustered. And I studied sharding, which means to slice up a database into logical segments that are hosted by separate db engines, typically with separate disk filesystems.I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. Revisiting my project object persistence needs, and thinking more about interoperability with semantic web technologies, I decided to convert my existing KB to an RDF-compatible form and then to evaluate RDF quad stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which is Java based and open source and thus very compatible with my other components. In Texai, RDF queries have a simpler form than SQL queries when retrieving logical statements from a store. For example, in SQL my schema had to provide separate tables for each object type: concept term, functional term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 rule, arity-3 rule, arity-4 rule and arity-5 rule. Many of these tables would have to be joined for a typical query (e.g. what concepts subsume a given concept?). My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. When Texai is deployed, I expect that the application will log its transactions to disk as a background process as a safeguard against losing the volatile KB in tmpfs.Hope this information is useful. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, April 16, 2008 11:51:35 PM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
Well, I haven't seen any intelligent responses to this so I'll answer it myself: On Thursday 17 April 2008 06:29:20 am, J Storrs Hall, PhD wrote: On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote: If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher who could learn to be the world's leading specialist in a particular field,... Travel agent. Better yet, housemaid. I can teach it to become these things because I know how to do them. Early AGIs will be more likely to be successful at these things because they're easier to learn. This is sort of like Orville Wright asking, If I build a flying machine, what's the first use I'll put it to: 1) Carrying mail. 2) A manned moon landing. Q: You've got to be kidding. There's a huge difference between a mail-carrying fabric-covered open-cockpit biplane and the Apollo spacecraft. It's not comparable at all. A: It's only about 50 years' development. More time elapsed between railroads and biplanes. Q: Do you think it'll take 50 years to get from travel agents to medical researchers? A: No, the pace of development has speeded up, and will speed up more so with AGI. But as in the mail/moon example, the big jump will be getting off the ground in the first place. Q: So why not just go for the researcher? A: Same reason Orville didn't go for the moon rocket. We build Rosie the maidbot first because: 1) we know very well what it's actually supposed to do, so we know if it's learning it right 2) we even know a bit about how its internal processing -- vision, motion control, recognition, navigation, etc -- works or could work, so we'll have some chance of writing programs that can learn that kind of thing. 3) It's easier to learn to be a housemaid. There are lots of good examples. The essential elements of the task are observable or low-level abstractions. While the robot is learning to wash windows, we the AGI researchers are going to learn how to write better learning algorithms by watching how it learns. 4) When, not if, it screws up, a natural part of the learning process, there'll be broken dishes and not a thalidomide disaster. The other issue is that the hard part of this is the learning. Say it takes a teraop to run a maidbot well, but petaop to learn to be a maidbot. We run the learning on our one big machine and sell the maidbots cheap with 0.1% the cpu. But being a researcher is all learning -- so each one would need the whole shebang for each copy. A decade of Moore's Law ... and at least that of AGI research. Josh --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
And, as far as I'm concerned, the last clause of item 4 and the transition to 5 and 6 clearly demonstrates why Steve seems to be making a lot of progress compared to everyone else. - Original Message - From: Stephen Reed To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 10:23 AM Subject: Re: [agi] database access fast enough? YKY Here is what I learned from implementing the Texai knowledge base. It persists symbolic statements about concepts. 1.. I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on SuSE 64-bit Linux. My Java application driving MySQL dramatically slowed down when the number of rows exceeded 20 million as compared to the initial load of 5 million rows. 2.. I then tried Oracle Berkeley DB Java Edition (open source) which provides no SQL query facility, instead one programs directly to its API for inserts, queries, updates and so forth. It is faster than MySQL for my large KB, but uses four times as much disk space due to its method of inserting new rows at the end of the file, and having lots of free space. 3.. I then studied partitioning, which means to break up the monolithic KB into smaller databases in which accesses are expected to be clustered. And I studied sharding, which means to slice up a database into logical segments that are hosted by separate db engines, typically with separate disk filesystems. 4.. I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. 5.. Revisiting my project object persistence needs, and thinking more about interoperability with semantic web technologies, I decided to convert my existing KB to an RDF-compatible form and then to evaluate RDF quad stores. 6.. After some analysis, I chose to evaluate the Sesame 2 RDF store, which is Java based and open source and thus very compatible with my other components. In Texai, RDF queries have a simpler form than SQL queries when retrieving logical statements from a store. For example, in SQL my schema had to provide separate tables for each object type: concept term, functional term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 rule, arity-3 rule, arity-4 rule and arity-5 rule. Many of these tables would have to be joined for a typical query (e.g. what concepts subsume a given concept?). 7.. My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. 8.. When Texai is deployed, I expect that the application will log its transactions to disk as a background process as a safeguard against losing the volatile KB in tmpfs. Hope this information is useful. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, April 16, 2008 11:51:35 PM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. -- agi | Archives | Modify Your Subscription --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 2:50 AM, YKY (Yan King Yin) wrote: ARC (Adaptive Cache Replacement) seems to be one of the most popular methods, and it's based on keeping track of frequently used and recently used. Unfortunately, for AGI / inference purposes, those may not be the right optimization objectives. It is a cache replacement algorithm, what would be a right optimization objective for such an algorithm? There is a lot of cleverness in the use of the cache to maximize cache efficiency beyond the cache replacement algorithm -- it is one of the most heavily engineered parts of a database engine. As an FYI, ARC is patented by IBM. PostgreSQL uses a different but similar algorithm that is indistinguishable from ARC in benchmarks (having implemented ARC briefly, not realizing that it was patented). The requirement of inference is that we need to access a lot of *different* nodes, but the same nodes may not be required many times. Perhaps what we need is to *bundle* up nodes that are associated with each other, so we can read a whole block of nodes with 1 disk access. This requires a very special type of storage organization -- it seems that existing DBMSs don't have it =( Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index- organization features in databases address your issue here. It is pretty difficult to generate an access pattern use case that they cannot be optimized for with a good database engine. They are very densely engineered pieces of software, designed to be very fast while scaling well in multiple dimensions and adapting to varying workloads. On the other hand, if your use case is simple enough you can gain some significant speed for modest effort by writing your own engine that is purpose-built to be optimized for your needs. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 6:07 AM, Mark Waser wrote: I have to laugh at your total avoidance of Microsoft SQL Server which is arguably faster and better scaling for truly mixed use than everything except possibly Oracle on ordinary hardware; which is much easier to use than Oracle; and which is the easiest to actually put *GOOD* code in the database engine itself (particularly when compared to Oracle's *REALLY* poor java imitation). Discussing SQL Server does not generalize well in that they reimplement the core engine design with almost every release once they realize they hosed the design with the last release. For example, up until SQL Server 2005 the transaction engine was weak such that PostgreSQL could spank it in transaction throughput -- in 2005 they switched to a transaction model more like PostgreSQL and Oracle and gained some parity. SQL Server still does not really distribute all that easily, unlike Oracle or PostgreSQL. SQL Server versions before the current two year old one were pretty much dogs in a lot of ways. The most recent version is as you state a pretty solid database engine. Oracle is a major pain in the ass to use but does scale well, though for many OLTP loads it is barely faster than PostgreSQL these days. If putting your code in the engine is the goal, PostgreSQL wins by a country mile. The entire engine from front to back is deeply hackable with very clean APIs and you can even safely bind binary code into the engine at runtime. That the transaction engine scales quite well is just a bonus. People have already written hooks for a dozen languages into it. I've written performance-sensitive customizations of PostgreSQL in the past, and for purposes like that it can often be much faster than the commercial alternatives, as the alternatives tend to be relatively feature poor and shallow when it comes to engine customization. Making deep and very flexible customization a safe core feature was a design decision tradeoff in PostgreSQL that is somewhat unique to it. You can do a lot of really cool software implementation tricks with it that Oracle and SQL Server do not do. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
We may well see a variety of proto-AGI applications in different domains, sorta midway between narrow-AI and human-level AGI, including stuff like -- maidbots -- AI financial traders that don't just execute machine learning algorithms, but grok context, adapt to regime changes, etc. -- NL question answering systems that grok context and piece together info from different sources -- artificial scientists capable of formulating nonobvious hypotheses and validating them via data analysis, including doing automated data preprocessing, etc. And not to forget, of course, smart virtual pets and avatars in games and virtual worlds ;-)) --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
Hmmm... It's pretty hard to project the timing of different early-stage AGI applications, as this depends on the particular route taken to AGI, and there are many possible routes... We may well see a variety of proto-AGI applications in different domains, sorta midway between narrow-AI and human-level AGI, including stuff like -- maidbots -- AI financial traders that don't just execute machine learning algorithms, but grok context, adapt to regime changes, etc. -- NL question answering systems that grok context and piece together info from different sources -- artificial scientists capable of formulating nonobvious hypotheses and validating them via data analysis, including doing automated data preprocessing, etc. Then, after this phase, we may finally see the emergence of unified AGI systems with true human-level AGI. **Or**, it could happen that one of the above apps (or something not on my list) advances way faster than the others, for fundamental AI reasons or simply for practical economic reasons ... or due to luck... **Or**, it could well happen that someone gets all the way to human-level AGI before any of the above proto-AGI applications really becomes feasible and economically viable. In that case the answer will indeed be: Duh, the AGI can do anything... Which of these alternatives will happen is not obvious to me. It's not even obvious to me under the hypothetical assumption that the Novamente/OpenCog approach is gonna be the one that gets us to human-level AGI ... let alone if I drop that assumption and think about the problem from the perspective of the broad scope of possible AGI architectures. So I am a bit perplexed that some folks on this list are so surpassingly **confident** as to which route is going to unfold I don't want to get all Eliezer on you, but really, some reflection on the human brain's tendency toward overconfidence might be in order ;-O -- Ben G On Thu, Apr 17, 2008 at 10:30 AM, J Storrs Hall, PhD [EMAIL PROTECTED] wrote: Well, I haven't seen any intelligent responses to this so I'll answer it myself: On Thursday 17 April 2008 06:29:20 am, J Storrs Hall, PhD wrote: On Thursday 17 April 2008 04:47:41 am, Richard Loosemore wrote: If you could build a (completely safe, I am assuming) system that could think in *every* way as powerfully as a human being, what would you teach it to become: 1) A travel Agent. 2) A medical researcher who could learn to be the world's leading specialist in a particular field,... Travel agent. Better yet, housemaid. I can teach it to become these things because I know how to do them. Early AGIs will be more likely to be successful at these things because they're easier to learn. This is sort of like Orville Wright asking, If I build a flying machine, what's the first use I'll put it to: 1) Carrying mail. 2) A manned moon landing. Q: You've got to be kidding. There's a huge difference between a mail-carrying fabric-covered open-cockpit biplane and the Apollo spacecraft. It's not comparable at all. A: It's only about 50 years' development. More time elapsed between railroads and biplanes. Q: Do you think it'll take 50 years to get from travel agents to medical researchers? A: No, the pace of development has speeded up, and will speed up more so with AGI. But as in the mail/moon example, the big jump will be getting off the ground in the first place. Q: So why not just go for the researcher? A: Same reason Orville didn't go for the moon rocket. We build Rosie the maidbot first because: 1) we know very well what it's actually supposed to do, so we know if it's learning it right 2) we even know a bit about how its internal processing -- vision, motion control, recognition, navigation, etc -- works or could work, so we'll have some chance of writing programs that can learn that kind of thing. 3) It's easier to learn to be a housemaid. There are lots of good examples. The essential elements of the task are observable or low-level abstractions. While the robot is learning to wash windows, we the AGI researchers are going to learn how to write better learning algorithms by watching how it learns. 4) When, not if, it screws up, a natural part of the learning process, there'll be broken dishes and not a thalidomide disaster. The other issue is that the hard part of this is the learning. Say it takes a teraop to run a maidbot well, but petaop to learn to be a maidbot. We run the learning on our one big machine and sell the maidbots cheap with 0.1% the cpu. But being a researcher is all learning -- so each one would need the whole shebang for each copy. A decade of Moore's Law ... and at least that of AGI research. Josh --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify
Re: [agi] database access fast enough?
On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: You *REALLY* need to get up to speed on current database systems before you make more ignorant statements. First off, *most* databases RARELY go to the disk for reads. Memory is cheap and the vast majority of complex databases are generally small enough that they are normally held in memory during normal operation. That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Next, I suspect that whatever bundling you're talking about is likely to be along field boundaries and is likely going to be akin to just reading an entire FIELD table into memory (that will have the exact same structure as all other field tables but will be contiguous on disk so as to promote fast loads). To clarify what I mean: 1. the DB contains a large number of facts / rules (perhaps stored as rows in SQL parlance) 2. many of these rows have to be fetched for inference (Resolution tests if a rule leads to a successful proof, but more often than not the rules are discarded). 3. the rows are scattered all around the DB For example, let say I want to infer something about Harry Porter and JK Rowling, I would want to fetch these facts / rules: 1. Harry Porter is a successful book series 2. Harry Porter belongs to the fantasy genre 3. JK Rowling is the author of Harry Porter 4. JK Rowling is now richer than Queen Elizabeth II. etc... But I would probably NOT need facts / rules like: 1. Einstein is the creator of General Relativity 2. Water is heavier than oil etc... So we should keep track of what rules are usually used *together*, and perhaps bring them into physically contagious storage. I'm not sure which DB feature(s) allow this... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Hi Stephen, Thanks for sharing this! VERY few people have experience with this stuff... On 4/17/08, Stephen Reed [EMAIL PROTECTED] wrote: 4. I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. That seems like what we actually need. My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. If the inference requires a rule outside the sub-KB, you'd have to do a very expensive swap. I think this only works if you're sure the entire inference is contained within a sub-KB. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Hi Mark, This is, by the way, my primary complaint about Novamente -- far too much energy, mind-space, time, and effort has gone into optimizing and repeatedly upgrading the custom atom table that should have been built on top of existing tools instead of being built totally from scratch. Really, work on the AtomTable has been a small percentage of work on the Novamente Cognition Engine ... and, the code running the AtomTable is now pretty much the same as it was in 2001 (though it was tweaked to make it 64-bit compatible, back in 2004 ... and there has been ongoing bug-removal as well...). We wrote some new wrappers for the AtomTable last year (based on STL containers), but that didn't affect the internals, just the API. It's true that a highly-efficient, highly-customizable graph database could potentially serve the role of the AtomTable, within the NCE or OpenCog. But that observation is really not such a big deal. Potentially, one could just wrap someone else's graph DB behind the 2007 AtomTable API, and this change would be completely transparent to the AI processes using the AtomTable. However, I'm not convinced this would be a good idea. There are a lot of useful specialized indices in the AtomTable, and replicating all this in some other graph DB would wind up being a lot of work ... and we could use that time/effort on other stuff instead Using a relational DB rather than a graph DB is not appropriate for the NCE design, however. But we've been over this before... And, this is purely a software implementation issue rather than an AI issue, of course. The NCE and OpenCog designs require **some** graph or hypergraph DB which supports the manual and automated creation of complex customized indices ... and supports refined cognitive control over what lives on disk and what lives in RAM, rather than leaving this up to some non-intelligent automated process. Given these requirements, the choice of how to realize them in software is not THAT critical ... and what we have there now works -- Ben G --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index-organization features in databases address your issue here. Thanks... clustered indexing looks promising, but I need to study it in more details to see if it really solves the problem... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY said: If the inference requires a rule outside the sub-KB, you'd have to do a very expensive swap. I think this only works if you're sure the entire inference is contained within a sub-KB. Right. I envision Texai deployed as distributed agents operating within a hierarchical control system. Each agent's mission will be scoped to require immediate access to only a cache of some KB partition. Hopefully infrequent, cache misses will incur the penalty you mention, either to local disk, or worse - to the network. I also expect the system to be adaptive to whatever the user's computer allows with regard to resources (e.g. more RAM begets faster response). I am also considering torrent-style transfers to satisfy cache misses. As you point out an AGI's KB query is likely to access other linked objects (e.g. spreading activation search). So given that users will likely have asymmetric Internet connection bandwidth, It may be faster for large chunks of cache-filling KB data to be obtained simultaneously in slices from a multitude of collaborating peer agents. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote: For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. I analyzed the scalability of distributed indexing for my thesis (linked at http://www.mattmahoney.net/agi.html ). For data randomly distributed in a vector space model up to log n dimensions, storage is O(n log n), retrieval time is effectively O(log n) and update is O(log^2 n). In practice you can do better because data tends to cluster, reducing the effective number of dimensions, and because accesses tend to be distributed nonuniformly. Data accessed frequently will tend to be cached in nearby nodes. I realize you are asking about the relational model, but you can implement the most common transactions, e.g. retrieving or updating a small number of records at a time, by storing records of the form author timestamp table field=value field=value This also gives you transaction logging, rollback, and authentication, which will be important in any database with lots of users (I assume AGI). However I don't think it will be as powerful as records of the form author timestamp arbitrary_text. To use an example, If a lot of people search for Harry Porter, then a conventional database system would make future retrieval of the Harry Porter node faster. But the requirement of the inference system is such that, if Harry Porter is fetched, then we would want *other* things that are associated with Harry Porter to be retrieved faster in the future, for example items such as JK Rowling or fantasy fiction. A huge relational database would retrieve the fact that Harry Porter won a gold medal for the high jump in the 1908 Olympics. A better language model (like Google) might figure out that you meant Harry Potter :-) -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Clustered indexing *WILL* solve your problem if you're willing to include all the data you're going to need in the index. It's definitely a trade-off . . . . but arguably a solid one. - Original Message - From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 1:03 PM Subject: Re: [agi] database access fast enough? On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index-organization features in databases address your issue here. Thanks... clustered indexing looks promising, but I need to study it in more details to see if it really solves the problem... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. Also, I'm designing a learning algorithm that stores *hypotheses* in the KB along with accepted rules. This will multiply the size of the KB by a factor. YKY PS: In my last message, contagious should be contiguous... =) --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 9:08 AM, Mark Waser wrote: Yes, the newest versions of PostgeSQL could spank SQL Server 2000 after it was several years old. One tremendous advantage of PostgreSQL is it's very short development cycle. Actually, this was a fundamental and known weakness in the SQL Server 2000 transactional model, being more like DB2 than Oracle. Because PostgreSQL has used the same kind of model as Oracle -- and for a very long time -- it has always been relatively strong at OLTP throughput. Until SQL Server 2005, the Microsoft offering was never really competitive. It had little to do with development timelines. On the other hand, PostgreSQL was a bit of a dog at OLAP until relatively recently. You imply that the performance is due to some kind of linear development path, but in fact SQL Server 2005 changed its internal model to be like Oracle and PostgreSQL so that it could be competitive at OLTP. It is a matter of algorithm selection and tradeoffs, not engineering effort. SQL Server (until two years ago) has always had relatively poor lock concurrency, but gave very good baseline OLAP performance as a consequence of that decision. The reality is that it is much easier to make the Oracle/Postgres model perform satisfactorily at OLAP than to make the old SQL Server model perform satisfactorily at OLTP. -- in 2005 they switched to a transaction model more like PostgreSQL and Oracle and gained some parity. SQL Server still does not really distribute all that easily, unlike Oracle or PostgreSQL. Have you ever worked with an Oracle distributed database? Oracle does not distribute well. I've worked with very large databases on several major platforms, including Oracle and SQL Server in many different guises. Oracle's parallel implementation may not distribute that well, but that is because traditional transactional semantics are *theoretically incapable* of distributing well. To the extent it is possible at all, Oracle does a very good job at making it work. There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. Oracle only scales well when you know how to properly use it. In most installations that I've seen, Oracle underperforms even SQL Server 2000 because the DBA didn't do the necessary to make it perform optimally (because Oracle is *NOT* average person friendly). I've made *a lot* of money optimizing people's Oracle installations that I shouldn't have been able to make if Oracle could get out of it's own way. No argument here, one of the major problems of Oracle is that it is bloody impossible to use well without a full-time staff. I spent many years solving scaling problems on extremely large Oracle systems. The insidiousness of PostgreSQL in the market is that it is very Oracle- like at a high-level but *massively* simpler and easier to use and administer while still delivering much of the performance and a significant subset of the features of Oracle. SQL Server has done well against Oracle for similar reasons. The main problem with SQL Server these days is that it does not run on Unix. Most of the major historical suckiness does not apply to the current version. Making deep and very flexible customization a safe core feature was a design decision tradeoff in PostgreSQL that is somewhat unique to it. You can do a lot of really cool software implementation tricks with it that Oracle and SQL Server do not do. Yes. The biggest problems with PostgreSQL are that it doesn't have a Microsoft compatibility mode and it isn't clear to corporations where you can get *absolutely guaranteed* support. Sun Microsystems not only officially supports it, they do a lot of development on it, as does Fujitsu in Asia, Red Hat and a few other large companies that are heavily invested in it. A significant portion of the main PostgreSQL developers do it as their official corporate job. PostgreSQL is very broadly ANSI compatible (including a lot of ancillary database standards surrounding SQL), and to the extent it has a flavor it clearly borrows from Oracle rather than SQL Server. SQL Server has a lot of bits that do not conform to standards that everyone else supports. From a historical perspective, PostgreSQL shares a transaction model with Oracle, started on Unix, and has been around since a time when SQL Server was not something you would want to emulate. PostgreSQL has matured to the point where it mostly follows standards to the extent possible but has enough unique features and capabilities that it has started to become a flavor of its own. If you could swap out an MS-SQL server *immediately* for a PostgreSQL server simply by copy the data and rebinding a WINS
Re: [agi] database access fast enough?
Everyone, At startup, I simply had Dr. Eliza cycle through the heavily used part of the DB, so that it would run in RAM except for unusual access. Of course, its demo DB now easily fits into RAM. VM paging was a MUCH worse problem than is DB access. I suspect that unless you lock the code into RAM, that this may will forever be the case because less-used routines (e.g. exception handlers) will get pushed out of RAM by the DB engine's scramble for buffer space, which of course you can limit by tweaking the DB engine. Also, has any one here looked at using Flash Disks for DB? Vista now puts VM onto any available flash drives to gain performance. On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Yes. RAM is *HUGE*. Intelligence is *NOT*. Hmm, thinking on the keyboard... ~100E9 computing cells with ~50K inputs each, of which ~200 are active. One theory is that you would only have to carry the active inputs, plus some fraction of the inactive inputs while you watched for things to happen to make them active. Let's say that we must track ~1E3 inputs, for a total of 100E12 or one hundred trillion inputs. We could use fractal means to generate the original configuration (as biological brains probably do), very low precision arithmetic with statistical rounding, etc., which would reduce each input to just a few bytes to maintain, say ~10. This makes a total of 1E15 or one quadrillion bytes to represent a simulated human's instantaneous state of construction. An entire checkpoint would take little more, because it would only include in addition the electrical state of each of the 100E9 cells. Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate because 4/5 of the represented inputs are presently inactive, for a total of only 100 terabytes. Note that ~90% of those 100E9 cells are slow-responding glial cells, so while the state is large, the computational requirements may be well short of a petaflop. Of course, this makes a LOT of assumptions that no one has yet bothered to confirm in the laboratory, and I do NOT want to ignite an estimates war, so I invite constructive comments from anyone with more recent data than I have. Steve Richfield --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Thu, Apr 17, 2008 at 2:42 PM, Mark Waser [EMAIL PROTECTED] wrote: Really, work on the AtomTable has been a small percentage of work on the Novamente Cognition Engine ... and, the code running the AtomTable is now pretty much the same as it was in 2001 (though it was tweaked to make it 64-bit compatible, back in 2004 ... and there has been ongoing bug-removal as well...). And . . . and . . . and . . . :-) It's far more than you're admitting to yourself.:-) That's simply not true, but I know of no way to convince you. The AtomTable work was full-time work for a two guys for a few months in 2001, and since then it's been occasional part-time tweaking by two people who have been full-time engaged on other projects. We wrote some new wrappers for the AtomTable last year (based on STL containers), but that didn't affect the internals, just the API. Which is what everything should have been designed around anyways -- so effectively, last year was a major breaking change that affected *all* the software written to the old API. Yes, but calls to the AT were already well-encapsulated within the code, so changing from the old API to the new has not been a big deal. Absolutely. That's what I'm pushing for. Could you please, please publish the 2007 AtomTable API? That's actually far, far more important than the code behind it. Please, please . . . . publish the spec today . . . . pretty please with a cherry on top? It'll be done as part of the initial OpenCog release, which will be pretty soon now ... I don't have a date yet though... However, I'm not convinced this would be a good idea. There are a lot of useful specialized indices in the AtomTable, and replicating all this in some other graph DB would wind up being a lot of work ... and we could use that time/effort on other stuff instead Which (pardon me, but . . . ) clearly shows that you're not a professional software engineer I'm not but many other members of the Novamente team are My contention is that you all should be *a lot* further along than you are. You have more talent than anyone else but are moving at a truly glacial pace. 90% of Novamente LLC's efforts historically have gone into various AI consulting projects that pay the bills. Now, about 60% is going into consulting projects, and 40% is going into the virtual pet brain project We have very rarely had funding to pay folks to work on AGI, so we've worked on it in bits and pieces here and there... Sad, but true... I understand that you believe that this is primarily due to other reasons but *I am telling you* that A LOT of it is also your own fault due to your own software development choices. You're wrong, but arguing the point over and over isn't getting us anywhere. Worse, fundamentally, currently, you're locking *everyone* into *your* implementation of the atom table. Well, that will not be the case in OpenCog. The OpenCog architecture will be such that other containers could be inserted if desired. Why not let someone else decide whether or not it is worth their time and effort to implement those specialized indices on another graph DB of their choice? If you would just open up the API and maybe accept some good enhancements (or, maybe even, if necessary, some changes) to it? Yes, that's going to happen within OpenCog. Using a relational DB rather than a graph DB is not appropriate for the NCE design, however. Incorrect. If the API is identical and the speed is identical, whether it is a relational db or a graph db *behind the scenes* is irrelevant. Design to your API -- *NOT* to the underlying technology. You keep making this mistake. The speed will not be identical for an important subset of queries, because of intrinsic limitations of the B-tree datastructures used inside RDB's. We discussed this before. Seriously -- I think that you're really going to be surprised at how fast OpenCog might take off if you'd just relax some control and concentrate on the specifications and the API rather than the implementation issues that you're currently wasting time on. I am optimistic about the development speedup we'll see from OpenCog, but not for the reason you cite. Rather, I think that by opening it up in an intelligent way, we're simply going to get a lot more people involved, contributing their code, their time, and their ideas. This will accelerate things considerably, if all goes well. I repeat that NO implementation time has been spent on the AtomTable internals for quite some time now. A few weeks was spent on the API last year, by one person. I'm not sure why you want to keep exaggerating the time put into that component, when after all you weren't involved in its development at all (and I didn't even know you when the bulk of that development was being done!!) I don't care if, in OpenCog, someone replaces the AtomTable internals with something
Re: [agi] database access fast enough?
Actually, this was a fundamental and known weakness in the SQL Server 2000 transactional model, being more like DB2 than Oracle. I disagree. First off, we're talking about the DEFAULT transactional model, locking mode, and where new records are placed. It has always been posssible to tweak any of the databases to the other's transactional model. Second of all, it was not a weakness -- it was a deliberate choice of optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for most databases on limited memory machines with low OLTP requirements, this was the correct choice until ballooning memories made the reverse true). Because PostgreSQL has used the same kind of model as Oracle -- and for a very long time -- it has always been relatively strong at OLTP throughput. Until SQL Server 2005, the Microsoft offering was never really competitive. Bull. For anything except the heaviest OLTP loads, Microsoft was more than adequate. You don't need a semi to drive the highways. It had little to do with development timelines. On the other hand, PostgreSQL was a bit of a dog at OLAP until relatively recently. See? You're making my point.:-) You imply that the performance is due to some kind of linear development path, but in fact SQL Server 2005 changed its internal model to be like Oracle and PostgreSQL so that it could be competitive at OLTP. It is a matter of algorithm selection and tradeoffs, not engineering effort. SQL Server (until two years ago) has always had relatively poor lock concurrency, but gave very good baseline OLAP performance as a consequence of that decision. The reality is that it is much easier to make the Oracle/Postgres model perform satisfactorily at OLAP than to make the old SQL Server model perform satisfactorily at OLTP. Again, you're making my point. Until memory became cheap and OLTP become more critical, Microsoft made the right choice of OLAP over OLTP. When the world changed, so did they. I'd call that a strength and flexibility, not a weakness. I've worked with very large databases on several major platforms, including Oracle and SQL Server in many different guises. Oracle's parallel implementation may not distribute that well, but that is because traditional transactional semantics are *theoretically incapable* of distributing well. To the extent it is possible at all, Oracle does a very good job at making it work. So, is your claim that Oracle distributes better than Microsoft? If so, why? There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. And PostgreSQL will probably implement them long before Oracle or MS. Sun Microsystems not only officially supports it, they do a lot of development on it, as does Fujitsu in Asia, Red Hat and a few other large companies that are heavily invested in it. A significant portion of the main PostgreSQL developers do it as their official corporate job. Cool. I wasn't aware that it had made that many inroads. Awesome. PostgreSQL is very broadly ANSI compatible (including a lot of ancillary database standards surrounding SQL), and to the extent it has a flavor it clearly borrows from Oracle rather than SQL Server. SQL Server has a lot of bits that do not conform to standards that everyone else supports. From a historical perspective, PostgreSQL shares a transaction model with Oracle, started on Unix, and has been around since a time when SQL Server was not something you would want to emulate. PostgreSQL has matured to the point where it mostly follows standards to the extent possible but has enough unique features and capabilities that it has started to become a flavor of its own. If you could swap out an MS-SQL server *immediately* for a PostgreSQL server simply by copy the data and rebinding a WINS name or an IP address, I would be in hog heaven even if support wasn't absolutely guaranteed since I could always switch back. Given that there's a huge transition cost (changing scripts, procedures, etc.), I can't get *ANY* agreement for the thought of switching (and I'm sure that there are *MANY* more in my circumstances). The only corporate database that relatively easily ports back and forth with PostgreSQL is Oracle. Nonetheless, a number of people have ported applications to PostgreSQL from MS-SQL with good results; questions about porting nuances come up regularly on the PostgreSQL mailing lists. Beyond your basic ANSI compliance, database portability only sort of exists. Inevitably people use non-standard platform features that expose the specific capabilities of the engine being used to maximize performance. As a practical matter, you pick a database platform and stick with it as long as is reasonably possible.
Re: [agi] Comments from a lurker...
Mark, On 4/16/08, Mark Waser [EMAIL PROTECTED] wrote: True, but this is inherent with ALL less than perfectly understood systems and is not in any way peculiar to Dr. Eliza. Extrapolations are inherently hazardous, sometimes without reasonable limit. Correct. Part of the point to AGI is to automatically create knowledge bases that are as complete as possible. Dr. Eliza seems to be a reasonable attempt to use a small amount of cherry-picked knowledge to solve a wide but not complete range of unsolved problems of a given type -- and has all of the standard inherent advantages and disadvantages of that approach. Wouldn't you agree? Yes. There were a bunch of them and I don't claim to be a historian. As I understood those methods they used two kinds of expertise - one of which was similar to the symptoms and conditions that I use, and another that guided the repair process. Dr. Eliza does without the guidance. This has the advantage that it works with inept experts, and the disadvantage that it can be less efficient than if it had good guidance. I had to find a grand heuristic to replace expert-entered probabilities and the rest of that guidance. After lots of experimenting, that grand heuristic turned out to be incredibly simple, buried in the symptom weighting for various conditions, being that you count the first potential symptom (or its verified absence) as 80%, the next one as 80% * 20% = 16%, the third as 80% * 4% = 3%, etc. This gives a lot of weighting to the leading symptoms, but nonetheless seemed to work well. Wow! That's a *really* wicked tail-off. Seems really counter-intuitive. Yes - it surprised me too, and it took a bunch of effort for me to get a good handle on why it worked, because I REALLY don't like my software to depend on things that I don't understand. It comes from Shannon's information theory. The amount of information in a datum is most dependent on the attendant noise. If you had a perfect symptom that exactly tracked a cause-and-effect chain link, then you would do best to ignore all other symptoms, regardless of whether they supported or contradicted the perfect symptom. In our less-than-perfect world, the list of potentially useful symptoms is usually short, and the noise comes from other cause-and-effect chain links that may exhibit substantially identical symptoms. If you have two symptoms, one with high noise and one with low noise, you do best by substantially ignoring the noisy symptom. The key to separating links using noisy symptoms is to use more than noisy symptom that hopefully has uncoupled noise. When your knowledge composer KNOWS about the 80% roll-off, then they CAREFULLY select which symptoms to use and which to ignore, for a secondary human effect of keeping the knowledge composer from throwing in everything but the kitchen sink along with the dirty wash water. Note further that unmentioned symptoms are NOT significantly considered in computing the result, only those that are affirmed or denied. This means that if ONLY the third symptom in the list that would only have a 3% effect if among others, has a 100% effect if it is alone. This results in noisy results - Dr. Eliza reports 100% interim probability, but fails to mention the 50% noise factor, and continues to press the user to answer questions about the two symptoms that precede the 3% symptom that is currently driving everything. Note also that the 3% symptom is probably also driving other potential conditions where it may be earlier in the list, and those conditions may are also be inserting their own questions. To separate the various 100%s in interim results, I added a heuristic to slightly reduce the 100% results proportionately to how far down the list that the first confirmed/denied symptom is. In typical use, there are often as many negative results (from denied symptoms) than positive results! What could a negative probability possibly mean? Not only do we have no believable evidence of the associated condition, but if natural forces were to try to force it, that those forces would probably fail approximately the indicated percentage of the time. I'm not sure what you mean by guided the repair process Where the expert's model of a decision tree, questioning, significance of symptoms, etc., is used instead of the engine's own generated one that may annoy the knowledge composer. It is interesting to watch others composing for Dr. Eliza, because they have their own ideas how to proceed in the presence of certain symptoms that may be of wide variance to Dr. Eliza's approach. So far, discussing this with them at length has yielded that that there really isn't any good reason for doing it their way, and by letting Dr. Eliza do its own thing, that inputting is a LOT easier. Note that there are NO expert-entered percentages in the Knowledge.mdb, which seems to result in BETTER operation because experts almost as often lead things astray with myths as guide
Re: [agi] associative processing
Derek On 4/16/08, Derek Zahn [EMAIL PROTECTED] wrote: Steve Richfield, writing about J Storrs Hall: You sound like the sort that once the things is sort of roughed out, likes to polish it up and make it as good as possible. I don't believe your characterization is accurate. You could start with this well-done book to check that opinion: http://www.amazon.com/Beyond-AI-Creating-Conscience-Machine/dp/1591025117 Very interesting. Because you are new to the discussion here you probably don't quite get the topic of this mailing list (AGI); I think that I do - see comments after addressing your other comments. the system sort-of described in your papers I described TWO systems. The one in this thread I specifically designed with a mind to eventually emulate YOU, neuron-by-neuron, synapse-by-synapse, in real time. The one mentioned in my Comments from a lurker thread mentions Dr. Eliza, that is designed to solve difficult problems in simple ways that billions of people have missed for a million years, and very likely ANY astronomically-sized AGI machine would miss for centuries. It was unclear how AGI was supposed to quickly do something that was only possible after 10E14 human years of wars and other strife, without having to go through, and even potentially cause the same. Proof by example to me, but apparently still not yet to the remainder of this group, is that there ARE really important things that can only be solved inductively, and that socialized AGI-like humans have SO little inductive abilities that even relatively simple concepts have simply escaped human capabilities for a thousand millennia. I clearly understand this because my own native inductive abilities are also in short supply. I had to hang on by my fingernails just to get through differential equations. I eventually developed my own assortment of mental crutches to survive my shortfall in native inductive ability, which were subsequently expanded upon to form Dr. Eliza's concept and innards. does not address any of the issues of that topic (as defined in its core publications and conferences) so don't be too surprised if people here are not particularly excited about it. Hmm, I haven't seen a reference to those core publications. Is there a semi-official list? Much of what is presently known about human neuro-anatomy comes to people from the writings of Dr. William Calvin. I was his assistant at the U of W Department of Neurological Surgery. That was AFTER I had performed one of the first neurological simulations and the first known to have categorized inputs via unsupervised learning. We held each other's feet to the fire, Me for being wet-science correct, and Calvin for models that performed good-math computations. Everyone knows about synapses performing weighted accumulations, but few people know that many/most integrate and differentiate, and that inhibitory synapses are typically VERY non-linear with some VERY interesting transfer function, etc. I published a paper at the first IJCNN in San Diego explaining how everything pointed to wet neurons generally computing with the logarithms of probabilities of assertions being true. That simple fact should have guided future research, but lab researchers not being mathematicians, and neither going to NN conferences, this guiding fact as died away like the echo of some long-forgotten noise. When a tree falls in the forest... My son has beliefs that closely match those expressed by others on this forum, and we sometimes have long arguments about what is and is not reasonable for a human scale neural simulation program - beyond more all-too-human stupidity. My son has also developed the best known (and acknowledged as such at an unrelated WORLDCOMP presentation) general purpose neural net simulation program that runs on a PC, that is at once fast, flexible, and well-instrumented. It has good-looking graphics (that look like contemporary test instruments with fantastic abilities) and is able to stick its tentacles deeply into other applications (like flight simulator) to provide interactive input. I give him all the support that I can, but I still question where this is all going. His program is (presently) written in VB.net, converted from its earlier VB. My own personal interest is in living forever, but regardless of how expanded my brain might become, I suspect that I will STILL have the shortcomings that this sort of architecture brings with it, scary though that might be. THAT was part of my motivation for designing Dr. Eliza, which (it appears to me) could quickly (like in a year of adequate funding) grow beyond any AGI's future problem-solving abilities. It may take the likes of an evolved Dr. Eliza to provide the problem solving ability needed to design the AGI that people are discussing here. Steve Richfield --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed:
Re: [agi] database access fast enough?
On Apr 17, 2008, at 12:20 PM, Mark Waser wrote: It has always been posssible to tweak any of the databases to the other's transactional model. Eh? Choices in concurrency control and scheduling run very deep in a database engine, with ramifications that cascade through every other part of the system. Equivalent transaction isolation levels can behave very different in practice depending on the internal transaction representation and management model. You cannot turn off these side-effects, and you cannot tweak a non-MVCC-ish model to behave like an MVCC-ish model at runtime in any way that matters. Second of all, it was not a weakness -- it was a deliberate choice of optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for most databases on limited memory machines with low OLTP requirements, this was the correct choice until ballooning memories made the reverse true). The rise of the Internet, with its massive OLTP load characteristic, kind of settled the issue. It is true though that Oracle-like OLTP monsters have significantly higher resource overhead for storing the same set of records. These days it is concurrency bottlenecks that will kill you. So, is your claim that Oracle distributes better than Microsoft? If so, why? Very mature implementation of the concepts, and almost every conceivable mechanism and model for doing it is hidden under the hood. Remember, they started introducing the relevant concepts ages ago in Oracle 7, though in practice it was mostly unusable until relatively recently. Consequently, their implementation is easily the most general in that it works moderately well across the broadest number of use cases because they've been tweaking that aspect for years. Other commercial implementations tend to only work for a much narrower set of use cases. In short, Oracle has a long head start. There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. And PostgreSQL will probably implement them long before Oracle or MS. Ironically, a specific design decision that has created a fair amount of argument for years makes PostgreSQL the engine starting from the closest design point. PostgreSQL does not support threading and only uses a single process per query execution, originally for portability and data safety reasons -- the extreme hackability would be difficult to do otherwise. This made certain types of trivial parallelism for OLAP difficult. On the other hand, it has had distributed lock functionality for a number of versions now. If you look at newer models explicitly designed to make transactional database scale better across distributed systems, you find that they are built on a design requirement of single processes per resource, strict access serialization, no local parallelism, and distributed locks. Which is not that far removed from where PostgreSQL is today, if you remove massive local concurrency support and its high overhead. There are a number of outfits (see www.greenplum.com for a very advanced implementation) that have hacked PostgreSQL to scale across very large clusters for OLAP by essentially making the necessary tweaks to approximate these types of models. The next step would be to rip out a lot of expensive bits based on classical design assumptions that make distributed write loads scale poorly. In a sense, a design choice that has traditionally put some limits on scaling PostgreSQL for OLAP put it in exactly the right place to make implementation of next-generation architectures as natural of an evolution as can be expected in this case. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 12:26 PM, Mark Waser wrote: Actually, it's far worse than that. For serious systems, most of the heavy lifting is done inside the database with stored procedures which are not standard AT ALL. SQL is reasonably easy to port. Stored procedures that do a lot of work are not. The standard is SQL/PSM, which looks similar to Oracle's PL/SQL (and PostgreSQL's pl/pgsql). As a practical matter, support is not consistent enough or widespread enough for it to be entirely usable for purposes of portability though it is getting better. To be fair, full SQL/PSM support will not be core in PostgreSQL until the next release. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
RE: [agi] associative processing
Steve Richfield writes: Hmm, I haven't seen a reference to those core publications. Is there a semi-official list? This list is maintained by the Artificial General Intelligence Research Instutute. See www.agiri.org . On that site there are several semi-official lists -- under Publications and Instead of an AGI Textbook. Certainly there is very little agreement (on anything!) amongst the idiosyncratic group of people who post on this list and I did not intend to dissuade you from presenting your ideas (which I have found interesting so far, in proportion to the degree they address AGI topics); I was just explaining why people here are unlikely to find Dr. Eliza to be particularly interesting. --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] Sending attachments to the list
Richard, I presume that you were referring to (worst offender) ME here! On 4/16/08, Richard Loosemore [EMAIL PROTECTED] wrote: Just a quick reminder about list protocol: if you want to send someone a document (especially a pdf), please remember to send it to their personal email address, rather than send it to the entire list. One such PDF ended up connecting with Josh who is REALLY into hardware design, and a really interesting thread developed that, who knows, may lead to the magic chip needed to implement AGI. This alone may be worth the overhead! However, I DID send a bunch of stuff before I realized that it was going to the entire list - sorry about that. I will limit myself to at most one CAREFULLY-chosen PDF per posting in the future. I propose that someone move this list to Yahoo, that provides storage space, along with many other useful tools, like surveys. Or, better yet, make it available on a website. Some of us still collect their mail on a low bandwidth connection sometimes, Including me. and it can be hell to wait 20 minutes just to check your mail. Not with Gmail and other web-based services. They keep the attachments on their servers until they are clicked on. Who the heck uses POP3 with dialup? Steve Richfield --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
RE: [agi] associative processing
Note that the Instead of an AGI Textbook section is hardly fleshed out at all at this point, but it does link to a more-complete similar effort to be found here: http://nars.wang.googlepages.com/wang.AGI-Curriculum.html --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY, I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. But my opinion of the OpenCyc concept coverage with respect to that of a human five-year old differs greatly from yours. I concede that 20 OpenCyc facts are about the number a child might know, but in order to properly ground these concepts, I believe that a much larger number of feature vectors will have to be stored or available in abstracted form. For example, there is the concept of the child's mother. Properly grounding that one concept might require abstracting features from thousands of observations: wet hair motherfar away motherangy mothermother hidden from viewmother in a crowdmother's voicemother in dim lightmother from belowand so on Of course you can ignore fully grounded concepts as does current Cycorp for its applications, and as I will with Texai until it is past the bootstrap stage. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 3:58:43 PM Subject: Re: [agi] database access fast enough? On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. And your *opinion* has what basis? Are you arguing that RAM isn't huge? That's easily disprovable. Or are you arguing that intelligence is huge? That too is easily disprovable. Which one do I need to knock down? The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Computational requirements of AGI (Re: [agi] database access fast enough?)
--- Steve Richfield [EMAIL PROTECTED] wrote: On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Yes. RAM is *HUGE*. Intelligence is *NOT*. Hmm, thinking on the keyboard... ~100E9 computing cells with ~50K inputs each, of which ~200 are active. One theory is that you would only have to carry the active inputs, plus some fraction of the inactive inputs while you watched for things to happen to make them active. Let's say that we must track ~1E3 inputs, for a total of 100E12 or one hundred trillion inputs. We could use fractal means to generate the original configuration (as biological brains probably do), very low precision arithmetic with statistical rounding, etc., which would reduce each input to just a few bytes to maintain, say ~10. This makes a total of 1E15 or one quadrillion bytes to represent a simulated human's instantaneous state of construction. An entire checkpoint would take little more, because it would only include in addition the electrical state of each of the 100E9 cells. Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate because 4/5 of the represented inputs are presently inactive, for a total of only 100 terabytes. Note that ~90% of those 100E9 cells are slow-responding glial cells, so while the state is large, the computational requirements may be well short of a petaflop. Of course, this makes a LOT of assumptions that no one has yet bothered to confirm in the laboratory, and I do NOT want to ignite an estimates war, so I invite constructive comments from anyone with more recent data than I have. The Blue Brain project estimates 8000 synapses per neuron in mouse cortex. I haven't seen a more accurate estimate for humans, so your numbers are probably as good as mine. I estimate 10^11 neurons, 10^15 synapses (1 bit each) and a response time of 100 ms, or 10^16 OPS to replicate the processing of a human brain. The memory requirement is considerably higher than the information content of long term memory estimated by Landauer [1], about 10^9 bits. This may be due to the constraints of slow neurons, parallelism, and the pulsed binary nature of nerve transmission. For example, the lower levels of visual processing in the brain involve massive replication of nearly identical spot filters which could be simulated in a machine by scanning a small filter coefficient array across the retina. It also takes large numbers of nerves to represent a continuous signal with any accuracy, e.g. fine motor control or distinguishing nearly identical perceptions. However my work with text compression suggests that the cost of modeling 1 GB of text (about one human lifetime's worth) is considerably more than a few GB of memory. My guess is at least 10^12 bits just for ungrounded language modeling. If the model is represented as a set of (sparse) graphs, matrices, or neural networks, that's about 10^13 OPS. Remember that the goal of AGI is not to duplicate the human brain, but to do the work that humans are now paid to do. It still requires solving hard problems like language, vision, and robotics, which consume a significant fraction of the brain's computing power. But what matters is that the cost of AGI be less than human labor, currently US $10K per year worldwide and growing at 3-4% (5% GDP growth - 1.5% population growth). If my guess is right and Moore's law continues (halving costs every 1.5 to 2 years), then AGI is at least 10-15 years away. If it actually turns out there are no shortcuts to simulating the brain, then it is 30 years away. 1. Landauer, Tom, How much do people remember? Some estimates of the quantity of learned information in long term memory, Cognitive Science (10) pp. 477-493, 1986. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] associative processing
--- Steve Richfield [EMAIL PROTECTED] wrote: The one mentioned in my Comments from a lurker thread mentions Dr. Eliza, that is designed to solve difficult problems in simple ways that billions of people have missed for a million years, and very likely ANY astronomically-sized AGI machine would miss for centuries. It was unclear how AGI was supposed to quickly do something that was only possible after 10E14 human years of wars and other strife, without having to go through, and even potentially cause the same. As far as I can tell, it only gives medical advice based on your personal agenda. It knows only what you program into it. I published a paper at the first IJCNN in San Diego explaining how everything pointed to wet neurons generally computing with the logarithms of probabilities of assertions being true. That simple fact should have guided future research, but lab researchers not being mathematicians, and neither going to NN conferences, this guiding fact as died away like the echo of some long-forgotten noise. When a tree falls in the forest... I use the same technique in my PAQ7/8 data compressors (since Dec. 2005), although I was not aware of your research. A set of models independently estimate the probability p(0), p(1) that the next bit of input will be a 0 or 1 based on past history in various contexts. The predictions are mapped to x = log(p(1)/p(0)), combined by weighted averaging, then mapped by the inverse squashing function 1/(1+exp(-x)), which makes it a neural network. Then the weights are adjusted to favor the most accurate predictions in proportion to x*(actual - predicted), a simplification of back propagation that minimizes coding cost rather than RMS prediction error. I should mention the technique works quite well. http://www.maximumcompression.com/data/summary_sf.php -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Stephen Reed [EMAIL PROTECTED] wrote: I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. Disk access rate is ~10 times faster than ethernet access rate. IMO, if RAM is not enough the next thing to turn to should be the harddisk. Distributive AGI is a fascinating idea, but you have to solve a lot of algorithmic problems to make it work. If each agent has only a slice of the full KB, the average commonsense query would require cooperation among many agents. That's a very challenging algorithmic problem. I'm content to do simple, single-machine AGI. But my opinion of the OpenCyc concept coverage with respect to that of a human five-year old differs greatly from yours. I concede that 20 OpenCyc facts are about the number a child might know, but in order to properly ground these concepts, I believe that a much larger number of feature vectors will have to be stored or available in abstracted form. For example, there is the concept of the child's mother. Properly grounding that one concept might require abstracting features from thousands of observations: = Yes, I actually agree with you -- I subconsciously tuned down my estimates as I was talking to Mark =) I think sensory processing is going to be a very hard problem, so we should postpone sensory grounding as late as possible, and instead focus on text. Don't forget that the AGI needs to have *episodic* memory as well. If we include that, secondary storage is certainly needed. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote: Disk access rate is ~10 times faster than ethernet access rate. IMO, if RAM is not enough the next thing to turn to should be the harddisk. Eh? Ethernet latency is sub-millisecond, and in a highly tuned system approaches the 10 microsecond range for something local. Much, much faster than disk if the remote node has your data in RAM and is relatively local. Note that relatively local can mean geographically regional. The round-trip RAM access time from my machine to a machine on the other side of town is a fraction of millisecond over the Internet connection (not hypothetical, actually measured at ~400 microseconds). I wish disk access was even remotely that good. And this was with inexpensive Gigabit Ethernet. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
[agi] Rationalism and Empricial Rationalism
From: Mark Waser [EMAIL PROTECTED] Subject: Re: [agi] Rationalism and Scientific Rationalism: Was Logical Satisfiability...Get used to it. It looks as if you're saying that scientific rationalism must be grounded but that rationalism in general need not be. Is this a correct interpretation? - No, yes and I'm not sure. I would like to write a message about an artificial rationalism and an artificial empirical rationalism. I am not going to try to write about an AI architecture, but I do want to write in terms that can lend themselves to a discussion of how rationalism and empirical rationalism can be designed into an AGI program. However, this is not meant as a definitive statement on the various ways that the words and concepts behind the words 'rationalism' and 'empiricism' are used. (The phrase empirical rationalism is probably a better term for me to use then scientific rationalism.) But yes, in general, I feel that scientific rationalism and empirical rationalism have to be more grounded than simple rationalism, especially when we are trying to understand how these concepts can be applied to an advanced AGI program. But on the other hand, the concept of grounding may be too strong a term. Think of an AGI program that can learn from a natural language text-based IO but does not have any other kind of IO. I would argue that there has to be a distinction between the definition of rationalism (using some kind of applied logic-based systems) and empirical rationalism (which also has some kind of experimental way of grounding ideas and conjectures, and some kind of conceptual integration as well). The problem with this example however, is that the same conceptual functions are being used to devise conjectures about the IO data environment as are used to test those conjectures. So there is a real question about the depth of the 'grounding' since the problem is so obviously tricky. It is my belief that while the concept of grounding is important for advanced AGI, it is itself no more solid a premise than the other concepts used in AGI. But I do believe that some kind of 'grounding' is absolutely necessary for it. Jim Bromer --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
RE: [agi] database access fast enough?
YKY Said: The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. Don't forget about solid state hard drives (SSDs). Currently Solid State Drives speed up typical database applications by about 30 times. And that's without stripping out all the old caching overhead code databases used for handling the order of magnitude speed differences between RAM and hard drives. Large Storage Area Network Vendors like EMC are looking to SSD Drives to eliminate IO bottlenecks in corporate applications where large datawarehouses reach 20Tb very quickly. And look for capacity to continue to double about every 18 months driving the price down very quickly. And due to higher reliability and lower energy costs to run it won't be too long before hard drive join the ranks of 8-track tape players, record players and 5 1/4 diskettes. http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1300939,00.html# http://www.storagesearch.com/ssd-fastest.html --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] An Open Letter to AGI Investors
I have stuck my neck out and written an Open Letter to AGI (Artificial General Intelligence) Investors on my website at http://susaro.com. All part of a campaign to get this field jumpstarted. Next week I am going to put up a road map for my own development project. Hi Richard, If I were a potential investor, I don't think I'd find your letter convincing. AI was first coined some 50 years ago: before I was born, and therefore long before I entered the field of AI. Naturally, I can't speak with personal experience on the matter, but when I read the early literature on AI or when I read about field's pioneers reminiscing on the early days, I get the distinct impression that this was an incredibly passionate and excited group. I would feel comfortable calling them a gang of hot-headed revolutionaries - even today, 50 years after inventing the term AI and at the age of 80, McCarthy writes about AI and the possibility of strong AI with passion and excitement. Yet, in spite of all the hype, excitement and investment that was apparently around during that time (or, more likely, as a result of the hype and excitement) the field crashed in the AI winter of the 80s without finding that dramatic breakthrough. There's the Japanese Fifth Generation Computer Systems project that I understand to be a massive billion dollar investment during the 80s into parallel machines and artificial intelligence; an investment that is today largely considered to be a huge failure. And of course, there's Cyc; formed with an inspiring aim to capture all commonsense knowledge, but still remains in development some 20 years later. And in addition to these, there are the many many early research papers on AI problem solving systems that show early promise and cause the authors to make wild predictions and claims in their Future Work... predictions that time has reliably proven to be false. So, why would I want to invest now? When I track down the biographies of several of the regulars on this list, I find that they entered the field during or after the AI Winter and never experienced the early optimism as an insider. How can you convince an investor that the passion today isn't just the unfounded optimism of researchers who don't remember the past? How can you convince an investor that AGI isn't also going to devolve again into an emphasis on publications rather than quality (as you claim AI has devolved) or into a new kind of weak AGI with no dramatic breakthrough? I think a better argument would be to point to a fundamental technological or methodological change that makes AGI finally credible. I'm not convinced that being lean, mean, hungry and hellbent on getting results is enough. If I believe in AGI, maybe my best bet is to invest my money elsewhere and wait until the fundamental attitudes have changed so each dollar will have a bigger impact, rather than squandered on a bad dead-end idea. Alternately, my best bet may be to invest in weak AI because it will give me a short-term profit (that can be reinvested) AND has a plausible case for eventually developing into strong AI. If you can offer no good reason to invest in AGI today (given all its past failures), aside from a renewed passion of its researchers, then a sane reader would have to conclude that AGI is probably a bad investment. Personally, I'm not sure what I feel about AGI (though, I wouldn't be here if I didn't think it was valuable and promising). However, in this email I'm trying to play the devil's advocate in response to your open letter to investors. -Ben --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com