Hi Linas,
Tx for taking the time to reply.
On Sunday, February 18, 2018 at 8:38:28 PM UTC+1, linas wrote:
>
> Hi Amirouche,
>
> I skimmed th PDF's. Some of the goals are laudable: I quote:
> "the exchange of partial graphs, personalised views on data
> and a need for trust. A strong versioning model supported by
> provenance".
>
>
> So, at a meta-level, yes. In practice, a triple-store strikes me as
> worthless, pointless, hopeless. Perhaps I am wrong -- I would love
> it if someone explained it to me. So from the PDF:
>
> ":Adam :knows :Bob"
>
> Great. Who? Fat Bob or skinny Bob? Did you mean Robert? Robert,
> like the guy down the hall, or Robert, the salesman who visits every
> Tuesday? Are you 100% certain about that? Or are you just guessing?
>
Bob is an identifier. It can actually Fat Bob or skinny Bob, it depends
what other
triples are associated with Bob.
> In my personal experience, triples are wholly inadequate to deal with
> knowledge representation of the above form. If I'm wrong, let me know
> how.
>
RDF triple stores represents graph or multiple graphs. The difference in
terms
of storage between a triple store and a property graph is that a triple
store put
emphasize on edges. That is, they store and query *labeled edges* called
triples.
For instance:
Bob knows Alice
Is a representation of the *directed edge* between the node called "*Bob*"
and and the
node called "*Alice*" with a label "*knows*". In particular, in a triple
store, node properties
are represented as edges too *ie.* triples which sounds counter intuitive
compared to
the property graph approach where you *store on disk* all properties of a
node with the node itself
a link it to incoming and outgoing edges, and similarly for edges, hence
the linked-list
storage approach. In triple store, a triple is decomposed into:
Subject Predicate Object
They are stored together.
An advanced triple store (like datomic) doesn't assume something particular
about the
Object (called Value in datomic) in terms of indexing until you specify it
in the schema,
see this particular paragraph in datomic documentation
<https://docs.datomic.com/on-prem/indexes.html#avet>. Similarly and
depending on
a schema you can define a particular predicate (or set of predicates) to be
the
indexed in particular fashion (see fulltext eventual indexing in datomic)
or spatiotemporal
indexing or whatever.
You can have an infinite set of predicate (hence an infinite set of
properties in terms of property graph),
because the *query engine streams everything*.
Datomic is actually a versionned triple store with linear history except it
doesn't
claim conformance with RDF standards (for good reason because many think
EAV and
RDF are failures). AFAIU the implementation of a triple store use the same
technics
described in this (new) documents:
- https://docs.datomic.com/on-prem/indexes.html
- https://docs.datomic.com/on-prem/schema.html
> The atomspace was designed to hold arbitrary expressions, like "Adam knows
> Bob, the curly-haired, fat and always-smiling Bob, and I know this because
> Carol overheard it while standing in line at the cafeteria for lunch. So
> I'm 98%
> certain that Adam knows Bob."
>
That is also possible in a triple store. In property graph you reify
hyper-edges
as multiple vertices and edges, using a similar technique in triples stores
to express
something about a particular triple.
It's clear to me that you can do the same thing with triple stores and
property graph,
except triples don't assume all properties of given node can stay in RAM.
That was
not my question.
> Should the atomspace also include some default model for exchange of
> partial graphs, versioning and provenance? Maybe. So far, we have little
> or no experience with any of this - no one has needed or asked for this,
> so I cannot guess if our current infrastructure is adequate, or if we need
> yet more. In a certain sense, versioning and provenance is already
> built into the atomspace, in a "strong" way. But no one uses it.
>
That's the heart of my question. Seems like the answer is simply: no.
Basically, what I wanted to know as *a scientist working with loads of
structured*
*data*, *do you feel the need to share and version in multiple branches
structured data to*
*do your work*. My take on this would have been "yes" because for instance
this thread makes me think of that “Preloading atomspace with initial data
<https://groups.google.com/forum/#!topic/opencog/jB7N-WV3wOs>” where
you explain that the best way to go, is to dump postgresql.
I have other use cases, in mind like *IF *(big if, I don't say that's what
should be done)
link grammar's and relex dictionaries were stored in a versioned atomspace,
it would
be easier (I think) to test new grammars... But today, the current workflow
is good enough
for those cases because the grammars are rather small and stay in RAM.
BUT (another big) if AGI projects rely more on more on *curated* structured
databases statistically
built that are bigger than RAM ie. imagine wikidata put together out of
unstructured text *that must be edited*,
then I think a versioned atomspace and proper tooling makes sens (that is a
tool like wikibase).
That said, like you say it's already possible for you to use the atomspace
to version data and use branches.
*My question: is versioning and branching bigger than ram structured data
part of your workflow?*
Just to be clear: I am not anymore planning to replace atomspace anytime
soon. You invested too much
in the atomspace and are moving toward more integration with the atomspace
that it makes that
unimaginable. Also, the only improvement would be easier workflows instead
for instance of dumping
the atomspace in wikibase, editing it there, and dumping it again and
loading it in atomspace which is not
a workflow that is in current practice AFAIK. (also wikibase is not a good
tool to edit arbitrary graphs)
Best regards
--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/36aefdb5-424c-4e24-a579-2c2571d2ff64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.