Oh, and I forgot to mention: opencog atoms are small -- maybe a few hundred bytes or so. The performance of most popular web databases totally sucks when the data is that small -- they are tuned for storing mp3's and jpeg files, which are megabytes each, and they are great for that - but they suck for teeny-weeny atoms. Been there, done that.
--linas On Sun, Oct 1, 2017 at 11:39 PM, Linas Vepstas <[email protected]> wrote: > Hi Amirouche, > > Let me top-post, it will be easier. First: bulk load and bulk save of the > atomspace is part of the API, but it's very blunt and ugly and useless. I > never-ever bulk-load or bulk-save my data. > > The more fine-grained API allows: > * Specific atoms to be loaded (i.e. the values, truthvalues, etc attached > to those atoms) > * The entire incoming set of a specific atom to be loaded. > * Load only that portion of the incoming set that is some specific type. > * Load all atoms of a specific type. > * Save just one specific atom. > > Let me give an example: So, first, I load all atoms of type WordNode. > There are maybe 100K or 200K of these, it depends. Next, I pick one word, > lets say (WordNode "the") and load all SectionLinks with that word in it. > (Sections are link-grammar disjuncts). For a word like "the", there might > be 20K or 50K or maybe more sections. By loading only the SectionLinks, I > can avoid loading the word-pairs (of which one word is "the"), because I > don't need the word-pairs, and there's like maybe 100K of them that I don't > need clogging up RAM. Then I run my algo, and then pick a different word, > and repeat. Pretty much all words have much much fewer sections than > "the". The total number of sections is maybe 25 million or double or > one-tenth of that (it depends), which is probably too much to load all at > the same time. I don't really need all 25M at the same time. > > So how can wiredtiger help? To summarize, here's what I got: > > So my algo knows exactly which atoms it wants loaded at any given time, > and I can also provide fairly strong hints about which ones are no longer > needed. > > I absolutely, totally must have these certain kinds of atoms loaded at > certain times, otherwise the algo totally fails. The atomspace API allows > me to ask for exactly those atoms that I want, when I want them. The > current API stalls (does not return to caller) until the requested atoms > are fully loaded in the atomspace. For all I care, the loading could be > done async, BUT the atoms must be there when they are accessed. (We would > need to change the API to do this kind of async load, but that's doable. > Hmmm. good idea, even, I should have done this earlier....) > > Maybe with wiredtiger, we could making loading async, so that the atoms > are not fully loaded until they are accessed. I'm not picky. I can give > hints about which ones to load, when. > > I have no clue how wiredtiger works, so I don't really know what to > suggest to you. I can only point you at the current, actual API and its > documentation. It is here. If you want a different API, that's OK, I'm OK > with that, as long as it can actually work. I do NOT need crazy ideas that > will never work. > > The API is here: > > https://github.com/opencog/atomspace/blob/master/opencog/ > atomspace/BackingStore.h > > an example implementation is here: > > https://github.com/opencog/atomspace/blob/master/opencog/ > persist/sql/multi-driver/SQLAtomStorage.cc > > If you try to figure out how the one is wired to the other, you will get > confused; there is some historical perversity that makes it more stupidly > complicated than it should be. Oh well. Just skip that part. > > So I will help you make wiredtiger work, if you explain to me how it can > "magically" load the needed atoms at the right time. Because otherwise, it > just seems like magic to me. > > If we can expose whizzy features in wiredtiger, that's fine too. but I > have no clue about that. > > --linas > -- *"The problem is not that artificial intelligence will get too smart and take over the world," computer scientist Pedro Domingos writes, "the problem is that it's too stupid and already has." * -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37Km3BLZ7YB4vPw_-EUCG7q%2BHvpxLaVfo8d170ZNoYX7Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
