> While I was writing the last message to Matt, I realized that having any DB > at all is very nearly pointless. That having a DB-backed storage is very > nearly an anti-pattern. > > The only useful function that a DB seems to provide is to be able to say "get > me this particular atom X from the disk". But how often do you need that? > Far more typical is that you want to load zillions of atoms, and do > something with them. If zillions of atoms are too big to fit in RAM, then we > are back to the chunking problem that started this conversation, and the > chunking problem has nothing to do with databases. > > The only other advantage of databases is incremental backup -- for multi-day, > multi-week-long calculations, you want to save partial results, one atom at a > time. >
This is almost right Linas, but just a little too extreme... Let's think e.g. about the genomics use-case. Consider the following situation. An OpenCog system is thinking hard about say 100 human genes at a time, building new links connecting them to various concepts and predicates etc. Then it saves its conclusions to a backing-store DB -- and moves on to the next batch of human genes. But while thinking about gene G, OpenCog may relate it to gene H, and may then want to grab information from the backing-store about gene H ... In this case the "chunk" of information that we want to grab from the backing-store is "the sub-metagraph giving the most intensely relevant information about gene H" .... Note that the chunk related to gene H, desired on a certain occasion, may overlap with the chunk related to gene H1 ... or the chunk related to GO category GO7 ... desired on other occasions... So I think it's a correct point that -- the quantity of Atom-stuff to be sucked out of the BackingStore into the Atomspace will almost always be a "chunk" of Atoms rather than an individual Atom However, I think these chunks are not always going to be extremely huge (they could be 100s of Atoms sometimes, or 1000s sometimes, not always hundreds of thousands or millions...)... and also the chunks needed are going to overlap w/ each other in ways that can't be foreseen in advance Thus I believe that we need some fairly powerful static pattern matching operating against the BackingStore, and that a primary operation to focus on it: -- send a Pattern Matcher query to BackingStore -- sent the Atom-chunk resulting from the query to Atomspace This is pretty clearly what is needed in the genomics use-case. But I could come up with similar stories for other use-cases, e.g. if an OpenCog-controlled robot meets a person "Piotr Walarz" for the first time, it may wish to fish into the BackingStore to pull in a whole bunch of nodes and links comprising previously ingested or inferred knowledge about "Piotr Walarz" .... This will be a sizeable chunk but maybe -- If the AI's knowledge about "Piotr Walarz" comes from online profiles etc., this could be a 100s to 1000s to 10000s of Atoms chunk ... -- if the robot or other robots sharing the same KB has had a lot of direct interaction with "Piotr Walarz", then it could be a much larger chunk ... which may need to get fished into RAM only partially and in multiple stages.... -- Ben ben -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBcKw2L2gxj2LTC8p4SsBv3yfUwpNmY7V3S0bUVAPZs%3DUg%40mail.gmail.com.
