Hi Alessandro I'd like to report back that using the GraphContentInputSource to load our large ontologies is working well now; I'll let you if anything crops up in further testing.
On the identifiers - This seems to work fine, we can capture these and use them to manage updates and deletes so that the graph can be deleted and added back. But, multiple graphs seem to be created. Example; adding an ontology using OntologyInputSource<?, TCProvider> src = GraphContentInputSource(is, (String) null, tcManager); String ID = space.addOntology(src) (tcManager grabbed via SCM ) This is getting an ID: ontonet::http://stanbol.apache.org/1328805977033 However, querying using SPARQL I am seeing two graphs with the same content, the additional graph being (in this case): org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-13 28805976740 (and the same if I directly do tcManager.listMGraphs() and tcManager.listTripleCollections()) Any idea what's going on here? Thanks Steve > -----Original Message----- > From: Stephen Bayliss [mailto:[email protected]] > Sent: 13 January 2012 17:28 > To: [email protected] > Subject: RE: Identifiers of graphs within spaces > > > Hi Alessandro > > Thanks very much for this - we're working through the > changes. One quick > question: > > > - you can supply the TcProvider to the > > GraphContentInputSource. If it is > > the same as the TcManager singleton instance, we skip > copying all the > > triples to yet another Graph. Should take considerably less > > time > > Should we be grabbing the TcProvider with an OSGi SCR > @Reference annotation, or TcManager.getInstance() ? > > Steve > > > > > -----Original Message----- > > From: Alessandro Adamou [mailto:[email protected]] > > Sent: 11 January 2012 11:59 > > To: [email protected] > > Subject: Re: Identifiers of graphs within spaces > > > > > > Dear Steve, > > > > thanks for your feedback and sorry for not coming back to > you earlier > > but I was on vacation until just the other day. > > > > I have committed an update to OntoNet that should address > > your inquiries: > > - addOntology() on spaces and sessions now returns the String > > that you > > can use as a key to identify the ontology in the > OntologyProvider (or > > the graph in the TcManager if you create a UriRef from it). > > - you can export scopes, spaces and sessions as Clerezza objects if > > needed - does not give you the OWL-oriented view on the > graph but can > > save much computing power. I will probably employ it on the REST API > > - you can supply the TcProvider to the > > GraphContentInputSource. If it is > > the same as the TcManager singleton instance, we skip > copying all the > > triples to yet another Graph. Should take considerably less > > time; on the > > other hand it prevents from using this method to *update* > > graphs. Note > > that there are protected binding methods in OntologyInputSource > > implementations for triple providers, physical IRIs etc. > > - other minor optimizations > > > > It would be great to share a benchmarking method to assess network > > scalability. So far I have managed to load a 200MB RDF/XML > > graph using a > > 256MB VM without out-of-memory errors. > > > > Also thanks for the post on the IKS blog (I am telling you > > here because > > I don't know if you and Martin are following an IKS mailing > > list)! I am > > working on an adopter-oriented one, and it would be great to > > include an > > overview on the Acuity experience with Stanbol-Fedora - what > > it does and > > what benefit it gets from Stanbol. Would you like to share? > > Unfortunately, I have been able to tell only my side of the > story so > > far, as the link at [1] keeps timing out on me :( > > > > Thanks a lot, keep up the good work! > > > > Alessandro > > > > [1] > > fedora-stanbol.acuityunlimited.net:18080/orbeon/stanbol-fedora > > /data-browser > > > > > > On 12/30/11 6:08 PM, Stephen Bayliss wrote: > > > Hi Alessandro > > > > > > Thanks very much for your responses. > > > > > >> Dear Steve, > > >> > > >> On 12/19/11 6:22 PM, Stephen Bayliss wrote: > > >>> Our use-case is thus: > > >>> > > >>> 1) Create OntologyContentInputSource(stream) > > >> Perhaps you're better off with a > > GraphContentInputSource(InpuStream), > > >> so it won't have to go through the burden of converting from > > >> OWLOntology to Graph just in order to store it (everything is > > >> stored as Clerezza graphs > > >> anyhow). Note that OWLOntology exports of scopes, spaces and > > >> ontologies > > >> within is possible regardless of the input source > > (although it is THE > > >> bottleneck of the current implementation, see my comment to > > >> STANBOL-433). > > >> > > >> I'm now adding the possibility to specify the TcProvider in the > > >> GraphContentInputSource constructor. This should also save > > the burden > > >> of copying the triples from the in-memory SimpleGraph to > the Graph > > >> stored in the TcManager (IF you pass the TcManager singleton as > > TcProvider). > > > Great, we'll take a look at the GraphContentInputSource and the > > > TcProvider constructor argument. > > > > > >>> - as our content is behind authentication, the stream > > >> is provided > > >>> by an HTTP client > > >>> - the content has an identifier (URI) assigned by > > the external > > >>> system (independent of the contents of the stream/ontology) > > >>> 2) Load OntologyInputSource into the space with > > >>> CustomOntologySpace.addOntology(...) > > >>> 3) When updated content comes along: > > >>> - remove the original (from the store as well as the space) > > >>> - add the updated content > > >>> > > >>> As the OntologyInputSource was created from a stream, it > > >> doesn't have > > >>> a physical IRI (I think?), > > >> correct > > > Actually logically it does have a physical IRI - the one > > that our HTTP > > > client sourced the input stream from - so if there was an > option to > > > specify the physical IRI somehow, maybe this would in fact > > do the job? > > > > > >>> so at (2) we don't have a "KReS identifier" for it > > >>> - so if we want to replace the ontology in the future with > > >> an updated > > >>> version I can't see currently an easy way of determining which > > >>> ontology to remove from the space and then delete it prior > > >> to adding > > >>> the updated content. > > >> if the ontology is named (i.e. it does have logical IRI > > even if not > > >> a physical one), you could simply call > > >> OntologyProvider#getKey(logicalIRI), but if you would like > > something > > >> simpler... see my next comment below. > > >> > > >>> I can list the graph keys through the OntologyProvider; > > but I think > > >>> what I need is to know (or be able to set?) the key when > > adding it? > > >> Would it be enough if this key were the return value of > > >> addOntology() ? > > > If there's no logical way of passing in an identifier that > > we wish to > > > use for the graph, then I think this would do the job; we > > can maintain > > > our own map/index of the graph keys vs the content > > provider's URIs for > > > these graphs. > > > > > > > > >>> Also I can see that if I get the TcProvider I can do a > > >>> .deleteTripleCollection(UriRef ref) - how would this > > UriRef link in > > >>> with the above (when I look at the identifiers of the ontologies > > >>> retrieved using the the keys from listGraphs, these are > > >>> "Anonymous-xyz" and don't have an IRI). > > >> I'll have to look into this one, fortunately I've still > > got some time > > >> on it. > > > Great, thanks! > > > > > >> All the best, > > >> > > >> Alessandro > > >> > > >> -- > > >> M.Sc. Alessandro Adamou > > >> > > >> Alma Mater Studiorum - Università di Bologna > > >> Department of Computer Science > > >> Mura Anteo Zamboni 7, 40127 Bologna - Italy > > >> > > >> Semantic Technology Laboratory (STLab) > > >> Institute for Cognitive Science and Technology (ISTC) National > > >> Research Council (CNR) Via Nomentana 56, 00161 Rome - Italy > > >> > > >> > > >> "As for the charges against me, I am unconcerned. I am > > beyond their > > >> timid, lying morality, and so I am beyond caring." (Col. > Walter E. > > >> Kurtz) > > >> > > >> > > > > > > > > > -- > > M.Sc. Alessandro Adamou > > > > Alma Mater Studiorum - Università di Bologna > > Department of Computer Science > > Mura Anteo Zamboni 7, 40127 Bologna - Italy > > > > Semantic Technology Laboratory (STLab) > > Institute for Cognitive Science and Technology (ISTC) National > > Research Council (CNR) Via Nomentana 56, 00161 Rome - Italy > > > > > > "As for the charges against me, I am unconcerned. I am beyond > > their timid, lying morality, and so I am beyond caring." > > (Col. Walter E. Kurtz) > > > > > >
