Re: [opencog-dev] neon: a database that works like git

Jeff Thompson Tue, 20 Feb 2018 04:04:59 -0800

> Is the representation of "Bob knows Alice" by a single edge a goodrepresentation? No -- its an absolutely terrible representation.

Indeed. A Wikidata-style triple is not knowledge representation, it isknowledge expression. The representation is somewhere else, and WIkidatawill never be allowed to be expressive enough to actually representknowledge.


> The goal of the atomspace is to eliminate human-curated datasets.

Music to my ears. "Curated" means "detached from the actual source andcontext of knowledge." With OpenCog I imagine that the AtomSpace willconnect to the original video feed (maybe from Sophia stored in aSingularityNET distributed file store) plus the algorithms whichresulted in the knowledge representation. This way anyone (and Sophiaherself when she needs to) can replay the algorithm on the originalobservation and analyze how the knowledge is derived, possiblycorrecting/improving it.


- Jeff

On 2018/02/19 21:55, Linas Vepstas wrote:

On Mon, Feb 19, 2018 at 12:35 PM, Amirouche Boubekki
<amirouche.boube...@gmail.com> wrote:

Hi Linas,
RDF triple stores represents graph or multiple graphs. The difference in
terms
of storage between a triple store and a property graph is that a triple
store put
emphasize on edges. That is, they store and query labeled edges called
triples.

For instance:

     Bob knows Alice

And here lies the crux of the matter. Mathematicians often describe graphs
as collections of edges and vertexes, and so if the triple store was simply
marketed as "oh hey, we've got this great way of storing labelled edges",
I might react "great", and then compare, feature-by-feature, with other graph
stores.

But instead, the triple stores make this leap into knowledge representation:
"Bob knows Alice" is NOT an example of a labelled edge: it is an example
of represented knowledge.

Can one represent knowledge with graphs? Yes, absolutely.

Is the representation of "Bob knows Alice" by a single edge a good
representation?
No -- its an absolutely terrible representation.

That's where I'm coming from.  Triple stores seem to delight in picking
this truly bad representation, and seeing how far they can go with it.
It does not seem to be a game I want to play.

The atomspace was designed to hold arbitrary expressions, like "Adam knows
Bob, the curly-haired, fat and always-smiling Bob, and I know this because
Carol overheard it while standing in line at the cafeteria for lunch. So
I'm 98%
certain that Adam knows Bob."


That is also possible in a triple store. In property graph you reify
hyper-edges
as multiple vertices and edges, using a similar technique in triples stores
to express
something about a particular triple.

Of course it is.  Once you have the ability to talk about edges and vertexes,
then you can create arbitrary graphs, and "reify" all you want. But when
you start doing that, then you are no longer representing "alice knows bob"
with a single edge. At which point ...  the jig is up. The pretension,
  the illusion
that a single edge is sufficient to represent "alice knows bob" is revealed
to be a parlor trick.

BTW, here is my take on edges and vertexes, vs. "something better" than
edges and vertexes:

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf

Basically, what I wanted to know as a scientist working with loads of
structured
data, do you feel the need to share and version in multiple branches
structured data to
do your work.

Currently, no. In some not-to-distant future, yes.

One current issue is that some of my non-versioned datasets are too big
to fit into RAM. If they were versioned, then they would be too big to fit,
times ten.

My take on this would have been "yes" because for instance
this thread makes me think of that “Preloading atomspace with initial data”
where
you explain that the best way to go, is to dump postgresql.

Well, that is kind of a "political post" -- some of the users, specifically, the
agi-bio guys, e.g. Mike Duncan, et al. appear to be managing their data as
large text files.  I am encouraging these users to try other ways of managing
their data.  Of course, this then leads to other data-management issues, but
well, one step at a time...

I have other use cases, in mind like IF (big if, I don't say that's what
should be done)

The mid-term plan is that it *should* be done.  That's not even the long-term
plan.

link grammar's and relex dictionaries were stored in a versioned atomspace,

The atomspace already offers several ways of doing versioning, for example,
the "ContextLink", as Nil mentioned, or the "AtTimeLink" -- but, so far, no one
actually uses these for versioning, they are underutilized, under-explored,
there's no hands-on experience of pros and cons with them.

it would
be easier (I think) to test new grammars...

The reason for placing those grammars into the atomspace is to "cut out the
middleman": when the learning algo learns a new word, then that new word is
instantly available to the parser.

The current process for this is to dump a select portion of the atomspace into
a certain sqlite3 format, copy the sqlite3 to wherever, halt and restart the
link-grammar parser.  It ... "works"... its klunky.

BUT (another big) if AGI projects rely more on more on curated structured
databases statistically
built that are bigger than RAM ie. imagine wikidata put together out of
unstructured text that must be edited,
then I think a versioned atomspace and proper tooling makes sens (that is a
tool like wikibase).

A) I already have databases that don't fit into RAM.  The answer is that the
algos need to be designed to touch the data "locally" - fetch, modify, store.

B) I am deeply distrustful of "curated data", at the low level. In my
experience,
it sucks, and I also believe that it is possible to do much better than curated
data, using automatic algos, and that this is "not that hard" Apparently, no
one believes me, so I still have to prove that it can be done.

C) Versioning is a non-issue for the atomspace. We've already got the needed
technology for versioning: e.g. the ContextLink,  the AtTimeLink.  No new
code needs to be written to get versioning.  What is needed, is the use-case,
the actual fooling-with-it scenario.

Just to be clear: I am not anymore planning to replace atomspace anytime
soon.

I am.  See https://github.com/opencog/atomspace/issues/1502  for details.

Apache Ignite might be the way to go.

You invested too much
in the atomspace and are moving toward more integration with the atomspace
that it makes that
unimaginable.

Its very imaginable. The atomspace is both an API and an implementation.
We can keep the API and discard the implementation.

Also, the only improvement would be easier workflows instead
for instance of dumping
the atomspace in wikibase, editing it there, and dumping it again and
loading it in atomspace which is not
a workflow that is in current practice AFAIK. (also wikibase is not a good
tool to edit arbitrary graphs)

Ah, jeez.  Do you think that google dumps the graph of the internet into
some tool, and then individual humans run around, and edit it node by node?
Like "gee, I should adjust the search ranking for xyz to include http:abc.com
at search rank 2 instead of search rank 3" ... Of course not.

Instead, teams of humans develop algorithms that then perform the edits
automatically, taking millions of cpu-hours on cloud servers.

It is absurd to think that we are going to use human beings to convert the
knowledge of wikipedia into hand-curated triples of the form
"Kennedy#94724379 was#82934872 presisdent#8923423" and then
power some A(G)I with such hand-curated data.  This strikes me as
the ultimate folly, but it seems like the RDF community is chasing this
folly with all its might.

The goal of the atomspace is to allow automated workflows that ingest
and alter data and convert it into formats that other algorithms can act on,
in turn.  The goal of the atomspace is to eliminate human-curated datasets.

--linas


Best regards

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/36aefdb5-424c-4e24-a579-2c2571d2ff64%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/1c041d22-0c23-b9ca-0628-fe72dbc78d7f%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] neon: a database that works like git

Reply via email to