Re: [opencog-dev] neon: a database that works like git

Linas Vepstas Mon, 19 Feb 2018 12:56:32 -0800

On Mon, Feb 19, 2018 at 12:35 PM, Amirouche Boubekki
<[email protected]> wrote:
> Hi Linas,


> RDF triple stores represents graph or multiple graphs. The difference in
> terms
> of storage between a triple store and a property graph is that a triple
> store put
> emphasize on edges. That is, they store and query labeled edges called
> triples.
>
> For instance:
>
>     Bob knows Alice

And here lies the crux of the matter. Mathematicians often describe graphs
as collections of edges and vertexes, and so if the triple store was simply
marketed as "oh hey, we've got this great way of storing labelled edges",
I might react "great", and then compare, feature-by-feature, with other graph
stores.

But instead, the triple stores make this leap into knowledge representation:
"Bob knows Alice" is NOT an example of a labelled edge: it is an example
of represented knowledge.

Can one represent knowledge with graphs? Yes, absolutely.

Is the representation of "Bob knows Alice" by a single edge a good
representation?
No -- its an absolutely terrible representation.

That's where I'm coming from.  Triple stores seem to delight in picking
this truly bad representation, and seeing how far they can go with it.
It does not seem to be a game I want to play.

>> The atomspace was designed to hold arbitrary expressions, like "Adam knows
>> Bob, the curly-haired, fat and always-smiling Bob, and I know this because
>> Carol overheard it while standing in line at the cafeteria for lunch. So
>> I'm 98%
>> certain that Adam knows Bob."
>
>
> That is also possible in a triple store. In property graph you reify
> hyper-edges
> as multiple vertices and edges, using a similar technique in triples stores
> to express
> something about a particular triple.

Of course it is.  Once you have the ability to talk about edges and vertexes,
then you can create arbitrary graphs, and "reify" all you want. But when
you start doing that, then you are no longer representing "alice knows bob"
with a single edge. At which point ...  the jig is up. The pretension,
 the illusion
that a single edge is sufficient to represent "alice knows bob" is revealed
to be a parlor trick.

BTW, here is my take on edges and vertexes, vs. "something better" than
edges and vertexes:

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf


> Basically, what I wanted to know as a scientist working with loads of
> structured
> data, do you feel the need to share and version in multiple branches
> structured data to
> do your work.

Currently, no. In some not-to-distant future, yes.

One current issue is that some of my non-versioned datasets are too big
to fit into RAM. If they were versioned, then they would be too big to fit,
times ten.

> My take on this would have been "yes" because for instance
> this thread makes me think of that “Preloading atomspace with initial data”
> where
> you explain that the best way to go, is to dump postgresql.

Well, that is kind of a "political post" -- some of the users, specifically, the
agi-bio guys, e.g. Mike Duncan, et al. appear to be managing their data as
large text files.  I am encouraging these users to try other ways of managing
their data.  Of course, this then leads to other data-management issues, but
well, one step at a time...

> I have other use cases, in mind like IF (big if, I don't say that's what
> should be done)

The mid-term plan is that it *should* be done.  That's not even the long-term
plan.

> link grammar's and relex dictionaries were stored in a versioned atomspace,

The atomspace already offers several ways of doing versioning, for example,
the "ContextLink", as Nil mentioned, or the "AtTimeLink" -- but, so far, no one
actually uses these for versioning, they are underutilized, under-explored,
there's no hands-on experience of pros and cons with them.

> it would
> be easier (I think) to test new grammars...

The reason for placing those grammars into the atomspace is to "cut out the
middleman": when the learning algo learns a new word, then that new word is
instantly available to the parser.

The current process for this is to dump a select portion of the atomspace into
a certain sqlite3 format, copy the sqlite3 to wherever, halt and restart the
link-grammar parser.  It ... "works"... its klunky.

> BUT (another big) if AGI projects rely more on more on curated structured
> databases statistically
> built that are bigger than RAM ie. imagine wikidata put together out of
> unstructured text that must be edited,
> then I think a versioned atomspace and proper tooling makes sens (that is a
> tool like wikibase).

A) I already have databases that don't fit into RAM.  The answer is that the
algos need to be designed to touch the data "locally" - fetch, modify, store.

B) I am deeply distrustful of "curated data", at the low level. In my
experience,
it sucks, and I also believe that it is possible to do much better than curated
data, using automatic algos, and that this is "not that hard" Apparently, no
one believes me, so I still have to prove that it can be done.

C) Versioning is a non-issue for the atomspace. We've already got the needed
technology for versioning: e.g. the ContextLink,  the AtTimeLink.  No new
code needs to be written to get versioning.  What is needed, is the use-case,
the actual fooling-with-it scenario.

> Just to be clear: I am not anymore planning to replace atomspace anytime
> soon.

I am.  See https://github.com/opencog/atomspace/issues/1502  for details.

Apache Ignite might be the way to go.

> You invested too much
> in the atomspace and are moving toward more integration with the atomspace
> that it makes that
> unimaginable.

Its very imaginable. The atomspace is both an API and an implementation.
We can keep the API and discard the implementation.

> Also, the only improvement would be easier workflows instead
> for instance of dumping
> the atomspace in wikibase, editing it there, and dumping it again and
> loading it in atomspace which is not
> a workflow that is in current practice AFAIK. (also wikibase is not a good
> tool to edit arbitrary graphs)

Ah, jeez.  Do you think that google dumps the graph of the internet into
some tool, and then individual humans run around, and edit it node by node?
Like "gee, I should adjust the search ranking for xyz to include http:abc.com
at search rank 2 instead of search rank 3" ... Of course not.

Instead, teams of humans develop algorithms that then perform the edits
automatically, taking millions of cpu-hours on cloud servers.

It is absurd to think that we are going to use human beings to convert the
knowledge of wikipedia into hand-curated triples of the form
"Kennedy#94724379 was#82934872 presisdent#8923423" and then
power some A(G)I with such hand-curated data.  This strikes me as
the ultimate folly, but it seems like the RDF community is chasing this
folly with all its might.

The goal of the atomspace is to allow automated workflows that ingest
and alter data and convert it into formats that other algorithms can act on,
in turn.  The goal of the atomspace is to eliminate human-curated datasets.

--linas


>
>
> Best regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/36aefdb5-424c-4e24-a579-2c2571d2ff64%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA34eSe21snxq-oXUVmdjH6H9c1bTQxKjGH7pDKmbtxb8dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] neon: a database that works like git

Reply via email to