Re: Identifiers of graphs within spaces

Alessandro Adamou Thu, 09 Feb 2012 07:18:47 -0800

Hi Steve, good to know your progress is putting the OntoNet changes togood use!

I'd like to report back that using the GraphContentInputSource to load our
large ontologies is working well now; I'll let you if anything crops up in
further testing.

Good! I'll also be glad if you have any figures to share re the size ofthe graphs in triples, size of the VM and loading times.

On the identifiers -

This seems to work fine, we can capture these and use them to manage updates
and deletes so that the graph can be deleted and added back.

But, multiple graphs seem to be created.


Hmmmmmm

I wouldn't be surprised to see multiple graphs if you were to load thesame ontology from multiple input streams, but since you are passing theTcManager to the input source this should not happen.

I will look into this hopefully later today. Please refer to the sameissue you opened


https://issues.apache.org/jira/browse/STANBOL-426

On another note, there is indeed an issue about defining a policyconcerning what should happen if you are submitting an input source withan ontology with the same ontology ID (not the Clerezza graph ID but theIRI of the one owl:Ontology individual in the content) as a stored one.Whether it implies


(a) the creation of a new graph, as it is now, or
(b) a graph replacement,
(c) a brutal, monotonic addition of all triples to the existing graph,
(d) no action / an exception, or
(e) some sort of sophisticated (DL-consistent?) merge.

This is still to be thought about. As Acuity, which of these policieswould you be happiest about?


Best regards
Alessandro

Example;  adding an ontology using

OntologyInputSource<?, TCProvider>  src = GraphContentInputSource(is,
(String) null, tcManager);
String ID = space.addOntology(src)

(tcManager grabbed via SCM )

This is getting an ID: ontonet::http://stanbol.apache.org/1328805977033

However, querying using SPARQL I am seeing two graphs with the same content,
the additional graph being (in this case):

org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-13
28805976740

(and the same if I directly do tcManager.listMGraphs() and
tcManager.listTripleCollections())

Any idea what's going on here?

Thanks
Steve

-----Original Message-----
From: Stephen Bayliss [mailto:[email protected]]
Sent: 13 January 2012 17:28
To: [email protected]
Subject: RE: Identifiers of graphs within spaces


Hi Alessandro

Thanks very much for this - we're working through the
changes.  One quick
question:

- you can supply the TcProvider to the
GraphContentInputSource. If it is
the same as the TcManager singleton instance, we skip

copying all the

triples to yet another Graph. Should take considerably less
time

Should we be grabbing the TcProvider with an OSGi SCR
@Reference annotation, or TcManager.getInstance() ?

Steve

-----Original Message-----
From: Alessandro Adamou [mailto:[email protected]]
Sent: 11 January 2012 11:59
To: [email protected]
Subject: Re: Identifiers of graphs within spaces


Dear Steve,

thanks for your feedback and sorry for not coming back to

you earlier

but I was on vacation until just the other day.

I have committed an update to OntoNet that should address
your inquiries:
- addOntology() on spaces and sessions now returns the String
that you
can use as a key to identify the ontology in the

OntologyProvider (or

the graph in the TcManager if you create a UriRef from it).
- you can export scopes, spaces and sessions as Clerezza objects if
needed - does not give you the OWL-oriented view on the

graph but can

save much computing power. I will probably employ it on the REST API
- you can supply the TcProvider to the
GraphContentInputSource. If it is
the same as the TcManager singleton instance, we skip

copying all the

triples to yet another Graph. Should take considerably less
time; on the
other hand it prevents from using this method to *update*
graphs. Note
that there are protected binding methods in OntologyInputSource
implementations for triple providers, physical IRIs etc.
- other minor optimizations

It would be great to share a benchmarking method to assess network
scalability. So far I have managed to load a 200MB RDF/XML
graph using a
256MB VM without out-of-memory errors.

Also thanks for the post on the IKS blog (I am telling you
here because
I don't know if you and Martin are following an IKS mailing
list)! I am
working on an adopter-oriented one, and it would be great to
include an
overview on the Acuity experience with Stanbol-Fedora - what
it does and
what benefit it gets from Stanbol. Would you like to share?
Unfortunately, I have been able to tell only my side of the

story so

far, as the link at [1] keeps timing out on me :(

Thanks a lot, keep up the good work!

Alessandro

[1]
fedora-stanbol.acuityunlimited.net:18080/orbeon/stanbol-fedora
/data-browser


On 12/30/11 6:08 PM, Stephen Bayliss wrote:

Hi Alessandro

Thanks very much for your responses.

Dear Steve,

On 12/19/11 6:22 PM, Stephen Bayliss wrote:

Our use-case is thus:

1) Create OntologyContentInputSource(stream)

Perhaps you're better off with a

GraphContentInputSource(InpuStream),

so it won't have to go through the burden of converting from
OWLOntology to Graph just in order to store it (everything is
stored as Clerezza graphs
anyhow). Note that OWLOntology exports of scopes, spaces and
ontologies
within is possible regardless of the input source

(although it is THE

bottleneck of the current implementation, see my comment to
STANBOL-433).

I'm now adding the possibility to specify the TcProvider in the
GraphContentInputSource constructor. This should also save

the burden

of copying the triples from the in-memory SimpleGraph to

the Graph

stored in the TcManager (IF you pass the TcManager singleton as

TcProvider).

Great, we'll take a look at the GraphContentInputSource and the
TcProvider constructor argument.

      - as our content is behind authentication, the stream

is provided

by an HTTP client
      - the content has an identifier (URI) assigned by

the external

system (independent of the contents of the stream/ontology)
2) Load OntologyInputSource into the space with
CustomOntologySpace.addOntology(...)
3) When updated content comes along:
      - remove the original (from the store as well as the space)
      - add the updated content

As the OntologyInputSource was created from a stream, it

doesn't have

a physical IRI (I think?),

correct

Actually logically it does have a physical IRI - the one

that our HTTP

client sourced the input stream from - so if there was an

option to

specify the physical IRI somehow, maybe this would in fact

do the job?

so at (2) we don't have a "KReS identifier" for it
- so if we want to replace the ontology in the future with

an updated

version I can't see currently an easy way of determining which
ontology to remove from the space and then delete it prior

to adding

the updated content.

if the ontology is named (i.e. it does have  logical IRI

even if not

a physical one), you could simply call
OntologyProvider#getKey(logicalIRI), but if you would like

something

simpler... see my next comment below.

I can list the graph keys through the OntologyProvider;

but I think

what I need is to know (or be able to set?) the key when

adding it?

Would it be enough if this key were the return value of
addOntology() ?

If there's no logical way of passing in an identifier that

we wish to

use for the graph, then I think this would do the job; we

can maintain

our own map/index of the graph keys vs the content

provider's URIs for

these graphs.

Also I can see that if I get the TcProvider I can do a
.deleteTripleCollection(UriRef ref) - how would this

UriRef link in

with the above (when I look at the identifiers of the ontologies
retrieved using the the keys from listGraphs, these are
"Anonymous-xyz" and don't have an IRI).

I'll have to look into this one, fortunately I've still

got some time

on it.

Great, thanks!

All the best,

Alessandro

--
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC) National
Research Council (CNR) Via Nomentana 56, 00161 Rome - Italy


"As for the charges against me, I am unconcerned. I am

beyond their

timid, lying morality, and so I am beyond caring." (Col.

Walter E.

Kurtz)


--
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC) National
Research Council (CNR) Via Nomentana 56, 00161 Rome - Italy


"As for the charges against me, I am unconcerned. I am beyond
their timid, lying morality, and so I am beyond caring."
(Col. Walter E. Kurtz)



--
M.Sc. Alessandro Adamou

Alma Mater Studiorum - Università di Bologna
Department of Computer Science
Mura Anteo Zamboni 7, 40127 Bologna - Italy

Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council (CNR)
Via Nomentana 56, 00161 Rome - Italy


"As for the charges against me, I am unconcerned. I am beyond their timid, lying 
morality, and so I am beyond caring."
(Col. Walter E. Kurtz)

Not sent from my iSnobTechDevice

Re: Identifiers of graphs within spaces

Reply via email to