Hi Mike,

About the flybase dataset: I think it provides some important lessons about
dataset size that can help frame thinking about practical applications and
real-world datasets.

But first, a quick review of the dataset. It contains 194 tables, and 460
million rows.  Using the current design, it takes about 680 GBytes of RAM
to load into the AtomSpace. This works out to about 2.6 Atoms/row -- this
is because most Atoms are shared between many rows. It also works out to
about 570 Bytes/Atom -- this relatively large size is due to not just the
size of the Atoms, but also the size of all of the various indexes that the
Atoms live in. (Recall that indexes are used for rapid traversal of the
dataset, e.g. during pattern query.)

In short, if you can get a cloud instance with 1TB of RAM, you *can* load
the entire dataset into the AtomSpace, and have lots of room left over. But
this misses the point: there are other datasets out there that are larger.
In the long run, you need to have a strategy for dealing with large
datasets that don't fit in RAM.

That's why the AtomSpace Bridge is called a bridge, and not an importer.
Although you *could* suck the entire dataset across the bridge, if you have
enough RAM, that's not the point. It's more important to still be usable,
even if you have only limited RAM. (Say, a Raspberry PI.)

But how? Well, for starters, the API only loads those rows and columns that
you are touching, instead of loading everything.

If you touch a lot of rows, you may end up loading more than you have RAM
for. This is where the CachingProxyNode comes in. It keeps a list of
recently-used Atoms; if there's no room, it removes the least-recently used
(LRU) Atoms from RAM. The Atom itself is not lost: it's still there, on
disk or network. Nothing is lost.

Using the CachingProxy, (together with the other ProxyNodes, e.g.
ReadThruProxy and WriteThruProxy), this is how you build a distributed
AtomSpace. And a reminder: this all works now. It's waiting for users.

Examples, including instructions on how to set up Postgres, and how to
download/install FlyBase:

https://github.com/opencog/atomspace-bridge/tree/master/examples

* `basic-demo.scm` -- Demonstrates how to use the basic API to access
  rows, columns and tables mirroring a Postgres database. Includes
  short instructions for how to set up Postgres in the first place.

* `table-browser.scm` -- Demonstrates a simple ASCII-graphics command
  line browser. You can bounce around, from row to row, column to
  column, and explore the dataset. This is a command-line browser,
  not a web-based browser, mostly because the web interfaces to
  the AtomSpace remain unfinished. (Help wanted).

-- Linas

On Mon, Mar 13, 2023 at 10:37 AM Michael Duncan <[email protected]> wrote:

> hi linus, i did successfully try the example code, it worked without a
> hitch.  internally to snet, the team working on importing flybase ran into
> the same problem you did, the result is too large for available machines
> and directly converting from the sql dump to metta is the current
> approach.  i am working on an example in reply to your github issue
> comments, sorry for the delay.
>
> On Monday, March 13, 2023 at 11:01:51 AM UTC-4 Linas Vepstas wrote:
>
>> Hi Ben,
>>
>> Some inline comments below
>>
>> On Sun, Mar 12, 2023 at 1:17 AM Ben Goertzel <[email protected]> wrote:
>>
>>> We expect to be releasing the first alpha versions of the MeTTa
>>> interpreter and Distributed Atomspace "any month now" ... i.e. these
>>> are both pretty close ...
>>>
>>> Efficient concurrent processing for the MeTTa interpreter, via
>>> integration of rholang language on the back end, is slated for early
>>> September... (a development thread led by Greg Meredith)
>>>
>>> Early experimentation is being done with PLN reasoning in the current
>>> pre-alpha MeTTa interpreter, and the Flybase DB of genetic knowledge
>>> about Drosophila Melanogaster is being used as an initial serious use
>>> case for the DAS (because we need reasoning on Flybase for the Rejuve
>>> project's work on Methuselah Fly genomics...)
>>>
>>
>> Mike Duncan asked about importing FlyBase into the AtomSpace; during the
>> Christmas break, I created a bridge for him, here:
>> https://github.com/opencog/atomspace-bridge/ -- It works. I did not get
>> the impression he actually tried it, though. I'm still waiting on
>> meaningful feedback.
>>
>> Re: PLN, I split out the Unifier last summer, and wrapped it with a brand
>> new RuleLink and some other new types. I had the general impression that
>> these would greatly simplify the design of PLN -- see the demos -- but I
>> thought you'd abandoned PLN!? https://wiki.opencog.org/w/RuleLink
>>
>> As to a distributed AtomSpace, there are now things called "ProxyNode"
>> https://wiki.opencog.org/w/ProxyNode which implement different styles of
>> distributed atomspace networking, including read-thru, write-thru, caching,
>> round-robin etc. So, basically, a local AtomSpace can forward requests to
>> other AtomSpaces on the net, in various different ways. These resemble the
>> various RAID modes for disk drives, e.g. RAID-0, RAID-1, but with read and
>> write channels split out separately. And of course, you can stack these.
>> mix-n-match to build more complex dataflow pipelines. Demo here:
>> https://github.com/opencog/atomspace/blob/master/examples/atomspace/persist-proxy.scm
>>
>> FWIW, There is also a new https://wiki.opencog.org/w/GrantLink which
>> provides a kind-of-like mutex.
>>
>> -- Linas
>>
>> Broader experimentation w/ implementing AI algorithms in MeTTa should
>>> start after the alpha release, and I expect in the summer or early
>>> fall we will begin applying Hyperon to control groups of agents
>>> cooperating to achieve goals in Minecraft, and to interpret/drive LLMs
>>> with an aim of more truthful question-answering/dialogue ...
>>>
>>
>>> By mid-Fall 2023 we should be at a stage where we can give more
>>> precise development timelines....   However, after the alpha releases
>>> we will already be well positioned to leverage a variety of open
>>> source community contributors to the project...
>>>
>>> (Apologies for the unsystematic response, but I'm short on time as
>>> usual and figured typing something informal/messy was better than
>>> waiting indefinitely till I found time for a well structured answer
>>> !!)
>>>
>>> -- Ben
>>>
>>>
>>> On Wed, Mar 1, 2023 at 8:59 AM Ivan V. <[email protected]> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > sorry if I digress a bit, but what is the current status of Hyperon?
>>> >
>>> > What are the checkpoints reached by now?
>>> >
>>> > When will it be available for testing?
>>> >
>>> > Thank you in advance for an answer,
>>> > Ivan
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> > To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/5be7407e-a643-4b66-a236-e1148cdcc5a4n%40googlegroups.com
>>> .
>>>
>>>
>>>
>>> --
>>> Ben Goertzel, PhD
>>> [email protected]
>>>
>>> "My humanity is a constant self-overcoming" -- Friedrich Nietzsche
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "opencog" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/opencog/CACYTDBcQZ57GzvNKUzDG5XJ6K2tP93VggxkW_-tdBZkMpe%2BG-w%40mail.gmail.com
>>> .
>>>
>>
>>
>> --
>> Patrick: Are they laughing at us?
>> Sponge Bob: No, Patrick, they are laughing next to us.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/32ac9510-4912-4c83-b98b-b259a04f2e03n%40googlegroups.com
> <https://groups.google.com/d/msgid/opencog/32ac9510-4912-4c83-b98b-b259a04f2e03n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA376vBOmrZGUHym5fbs%3DQDCAVkacb6vnxQLsGq%3D1cMc-bA%40mail.gmail.com.

Reply via email to