[opencog-dev] Re: digging into MST-Parser code

Linas Vepstas Wed, 16 Jan 2019 12:55:03 -0800

Hi Anton

On Wed, Jan 16, 2019 at 11:10 AM Anton Kolonin @ Gmail <[email protected]>
wrote:


>
>  From strategic product/service delivery perspective having AtomSpace as
> hyper-graph database (storage) layer isolated from application (business
> logic) layer would make a lot of sense, if that is possible.
>

I think this was always the case. The choice of words "business logic" is
kind-of funny. It's quite accurate, but I've found that there is a class of
catch-phrases that Ben thinks are boring (he will literally walk out of the
room), and I think this is one of them.

We've never tried to make the AtomSpace a business product. We've never
found a way to talk about the atomspace in a business-obvious,
developer-obvious kind of way (compare, again to grakn.ai). This means that
almost all developers struggle mightily to figure out how to use it, and
almost always fail. Compare, again to grakn.ai: it's not just tutorials and
demos and examples and documentation; its the need create an example that
*everyone* can relate to, and make that the primary example.


> As I see it, one of the problems preventing this nice architectural
> isolation between the layers is atom type hierarchy which is bound to
> low-level implementation of the AtomSpace (storage) concepts on one end
> and to high-level (business logic) aspects like NLP on the other end.
>

What's the problem?  Per a different thread, there is a need for a better
FFI (or rather a "foreign type interface" rather than a "foreign function
interface") but I think this a well-defined, easily solvable problem that
no one has been interested in solving, until now.

>
> Ideally, I would imagine having AtomSpace as a C/C++ graph database
> loadable with any atom type hierarchy, isolated from any specific atom
> type hierarchies like one used in OpenCog.
>

But that is the case already, is it not? For example, agi-bio has it's own
hierarchy; the atomspace does not know about it, but it can store it and
load it and pattern-match it and backward-chain with it just fine.

My question might be rhetorical; I know of many things that are wrong,
incomplete, poorly implemented ... but it is hard to figure out which of
these are important, and which are not.  So you'd have to make clear which
ones are the important ones.


> On top of this, there could be separate projects and any applications
> and scripts in any languages such as Scheme or C/C++ or whaterve,
> loading any atom type hierarchies into with any AGI/NLP/etc. applications.
>

I think that has been possible since about "forever", so you would have to
give a more detailed example.

Now, there are many things that one could do to make the atomspace
better/easier for "ordinary" users. I've thought a lot about these. But
doing so takes focus and effort.  The historical focus has been on PLN and
various conceptions of AGI, and essentially zero focus on "normal"
applications.

>
> But there is more important thing that is concerning me regarding the
> architecture. To my understanding, unlikely any conventional database
> used in industry, the OpenCog is not supposed to work multi-user
> environments. For instance if you have SQL table about animals, you may
> have multiple users querying different segments of the table related to
> different animals.
>

Do you mean "access permissions"? Read-write? There's some minimal support
for that; you can have a read-only atomspace (e.g. some huge genome
dataset) and then a read-write layer on top of it (so that some scientist
can modify portions of the dataset, without screwing up the total, and
without having to make a private, personal copy of the huge dataset.) This
works now, but it's minimal; no fancy features.

If you mean "atomic update", then no; the atomspace is more BASE-like than
ACID-like. This could be interesting to talk about.

If you mean "table schemas", we've got a prototype of that, called "deep
types".  In SQL you must always have a table schema.  In prolog/datalog,
you never need a schema (and I'm not sure it is even possible to specify a
schema in those languages). I promise not to mention XML schema. Ooops.
Same idea - you can write XML without a schema, but there are people who
insist that their app has to have one, and so -- XML schemas.  We have a
sketch for that in the atomspace -- see the wiki page on type
constructors.  The basics work. No one actually uses it for anything.

If you mean inner and outer joins, the pattern matcher already does that,
automatically.

>
> Seemingly, it does not work the same way in OpenCog - if two independent
> users start MST-Parsing on two different corpora, they will have data
> messed up together,


Why? Open a bug; this should work perfectly.  Once, long long ago, I've run
parsing in parallel on 3 different machines; the data was not "messed", it
summed up very nicely.

I have not actually tried this (or even thought about it) with the current
pipeline in opencog/nlp, so yes, there may be bugs. They should be fixed.
There's a potential performance penalty from syncing too often.  There
might be issues with atomic updates; I don't think we have atomic counters
fully implemented (work for that was started, but not finished) but you can
certainly do language learning without atomic counters.


> if they start inference or pattern matching activity
> on different topics, the topics will be messed up together.


Huh? This should work perfectly. Open a bug.


> The way it
> is supposed to get solved is having different AtomSpaces for each corpus
>

??

> or for each inference process but the AtomSpaces are really heavyweight
> and you can not create AtomSpaces dynamically for the user sessions.
>

?? Why can't you create atomspaces dynamically?
cog-new-atomspace
cog-push-atomspace
cog-pop-atomspace
cog-atomspace-readonly?
cog-set-atomspace!

and 8 more of these kinds of functions.



> Well, we may have pool of N AtomSpaces serving queue of M users, so if N
> < M then M-N users are staying in queue. But I anticipate that context
> AtomSpace initialization for every user coming from the queue could be
> as expensive as creation of the new AtomSpace for every user...
>

I don't understand. You can do this just fine. There's already a pool of
temporary atomspaces. You'd have to define what "expensive" means.  I think
you can create an atomspace in milliseconds or maybe tens of milliseconds
at most.

Destroying a database with millions of atoms in it slow ... but that is a
different issue.


>
> Something to get addressed before considering exposure of OpenCog-based
> services to SingularityNET.
>

There are 1001 things that generic databases do that the atomspace does not
do, or, at least, not efficiently, quickly, easily. It would be nice to
have those features. Up until now, there has been a very low demand for
these. Because the user base is tiny.

There is one very very important issue that is being ignored: we need to be
able to load the ghost rules for Sophia much more quickly than we do.  Last
time someone measured, it was unacceptably slow.   I'm not sure of what the
status on that is.

-- Linas



-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35M6CT%3D-D1cG40ucgtjUdbBzdZ7A03bMyQpGB64mDca%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: digging into MST-Parser code

Reply via email to