Re: [opencog-dev] neon: a database that works like git

Amirouche Boubekki Mon, 19 Feb 2018 10:36:05 -0800

Hi Linas,

Tx for taking the time to reply.

On Sunday, February 18, 2018 at 8:38:28 PM UTC+1, linas wrote:
>
> Hi Amirouche, 
>
> I skimmed th PDF's.  Some of the goals are laudable: I quote: 
> "the exchange of partial graphs, personalised views on data 
> and a need for trust. A strong versioning model supported by 
> provenance". 
>
>
> So, at a meta-level, yes.  In practice, a triple-store strikes me as 
> worthless, pointless, hopeless.  Perhaps I am wrong -- I would love 
> it if someone explained it to me.  So from the PDF: 
>
> ":Adam :knows :Bob" 
>
> Great. Who? Fat Bob or skinny Bob?  Did you mean Robert? Robert, 
> like the guy down the hall, or Robert, the salesman who visits every 
> Tuesday?  Are you 100% certain about that? Or are you just guessing? 
>

Bob is an identifier. It can actually Fat Bob or skinny Bob, it depends 
what other
triples are associated with Bob.

> In my personal experience, triples are wholly inadequate to deal with 
> knowledge representation of the above form.  If I'm wrong, let me know 
> how. 
>

RDF triple stores represents graph or multiple graphs. The difference in 
terms
of storage between a triple store and a property graph is that a triple 
store put
emphasize on edges. That is, they store and query *labeled edges* called 
triples.

For instance:

    Bob knows Alice

Is a representation of the *directed edge* between the node called "*Bob*" 
and and the 
node called "*Alice*" with a label "*knows*". In particular, in a triple 
store, node properties
are represented as edges too *ie.* triples which sounds counter intuitive 
compared to 
the property graph approach where you *store on disk* all properties of a 
node with the node itself
a link it to incoming and outgoing edges, and similarly for edges, hence 
the linked-list
storage approach. In triple store, a triple is decomposed into:

   Subject Predicate Object

They are stored together. 

An advanced triple store (like datomic) doesn't assume something particular 
about the
Object (called Value in datomic) in terms of indexing until you specify it 
in the schema,
see this particular paragraph in datomic documentation 
<https://docs.datomic.com/on-prem/indexes.html#avet>. Similarly and 
depending on
a schema you can define a particular predicate (or set of predicates) to be 
the 
indexed in particular fashion (see fulltext eventual indexing in datomic) 
or spatiotemporal
indexing or whatever.

You can have an infinite set of predicate (hence an infinite set of 
properties in terms of property graph),
because the *query engine streams everything*.

Datomic is actually a versionned triple store with linear history except it 
doesn't
claim conformance with RDF standards (for good reason because many think 
EAV and
RDF are failures). AFAIU the implementation of a triple store use the same 
technics 
described in this (new) documents:

- https://docs.datomic.com/on-prem/indexes.html
- https://docs.datomic.com/on-prem/schema.html

> The atomspace was designed to hold arbitrary expressions, like "Adam knows 
> Bob, the curly-haired, fat and always-smiling Bob, and I know this because 
> Carol overheard it while standing in line at the cafeteria for lunch. So 
> I'm 98% 
> certain that Adam knows Bob."
>

That is also possible in a triple store. In property graph you reify 
hyper-edges
as multiple vertices and edges, using a similar technique in triples stores 
to express 
something about a particular triple. 

It's clear to me that you can do the same thing with triple stores and 
property graph,
except triples don't assume all properties of given node can stay in RAM. 
That was
not my question.

> Should the atomspace also include some default model for exchange of 
> partial graphs, versioning and provenance?  Maybe. So far, we have little 
> or no experience with any of this - no one has needed or asked for this, 
> so I cannot guess if our current infrastructure is adequate, or if we need 
> yet more.  In a certain sense, versioning and provenance is already 
> built into the atomspace, in a "strong" way. But no one uses it. 
>

That's the heart of my question. Seems like the answer is simply: no.

Basically, what I wanted to know as *a scientist working with loads of 
structured*
*data*, *do you feel the need to share and version in multiple branches 
structured data to*
*do your work*. My take on this would have been "yes" because for instance
this thread makes me think of that “Preloading atomspace with initial data 
<https://groups.google.com/forum/#!topic/opencog/jB7N-WV3wOs>” where
you explain that the best way to go, is to dump postgresql.

I have other use cases, in mind like *IF *(big if, I don't say that's what 
should be done)
link grammar's and relex dictionaries were stored in a versioned atomspace, 
it would
be easier (I think) to test new grammars... But today, the current workflow 
is good enough
for those cases because the grammars are rather small and stay in RAM. 

BUT (another big) if AGI projects rely more on more on *curated* structured 
databases statistically
built that are bigger than RAM ie. imagine wikidata put together out of 
unstructured text *that must be edited*, 
then I think a versioned atomspace and proper tooling makes sens (that is a 
tool like wikibase).

That said, like you say it's already possible for you to use the atomspace 
to version data and use branches.

*My question: is versioning and branching bigger than ram structured data 
part of your workflow?*

Just to be clear: I am not anymore planning to replace atomspace anytime 
soon. You invested too much
in the atomspace and are moving toward more integration with the atomspace 
that it makes that 
unimaginable. Also, the only improvement would be easier workflows instead 
for instance of dumping
the atomspace in wikibase, editing it there, and dumping it again and 
loading it in atomspace which is not
a workflow that is in current practice AFAIK. (also wikibase is not a good 
tool to edit arbitrary graphs)

Best regards

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/36aefdb5-424c-4e24-a579-2c2571d2ff64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] neon: a database that works like git

Reply via email to