Re: Is this a good way to get started?

Nate Marks Fri, 12 Dec 2014 04:51:39 -0800

Rob,

Thank you again for following up. This is great information and opens the
doors to some additional research I need to do as well.



I'm really excited about Jena.    The only thing better than finding great
technology is finding a vibrant and helpful user group.


I'll be back soon and thanks again!

On Fri, Dec 12, 2014 at 5:16 AM, Rob Walpole <[email protected]> wrote:

> Hi again Nate,
>
> When you talk about your T-box data, I think this would contain class
> > hierarchies, information about which ones are disjoint, etc. Is that
> right?
> >
>
> Exactly.. and it is a tiny dataset compared the instance data on the A-box
> side.
>
> >
> > `Is there ever a risk that a change to the ontology component (T-box) can
> > invalidate the data component (A-box)? If so, how do you manage that?
> >
>
> Definitely.. but in that case we create a patch using SPARQL Update and
> apply the patch to the data. We keep the patch in our version control
> system so that we have a record of the changes we have made and can
> re-apply them if necessary.
>
> >
> > When you load straight to the triple store, is there a single RDF? if
> not,
> > do you use an assembler to gather to multiple files?
> >
>
> This depends. We happen to be using named graphs - I don't know whether
> this is appropriate for you or not. We also happen to be using Jena TDB so
> if the data is held in N-Quad format then then we load single file which
> contains separate graphs. Jena allows you to do this using the tdbloader
> command line tool. We could just as easily load separate RDF files that
> were in a triple format such as Turtle and specify the graph name during
> loading. The result is the same. I wouldn't get too hung up on named graphs
> unless you think they are really appropriate for you though as they do add
> some complexity to updating the data which it may be better to avoid at
> first. The reason we chose to do this is that our ontology is still
> developing and we wanted to be able to delete terms that we had decided to
> dump without leaving cruft in the triplestore. Dropping the graph seemed to
> be the best way to achieve this.
>
> >
> > Does separating the T-box and A-box data have any down sides?  Is it
> > invisible to reasoners , for example?
> >
>
> Yes, as I say, using named graphs adds complexity to updates. We are using
> Fuseki and we specifiy "<#dataset> tdb:unionDefaultGraph true" in the
> Fuseki config file and this means that the when we query the data we can
> forget about the named graphs as it is transparent to the query. When we do
> updates though strange things can happen if we don't specify the graph name
> in the right part of the query. I can't say how it impacts on reasoning as
> we don't use this at present.
>
> >
> > Finally, I'm obviously a complete neophyte.  Am I in the wrong group?  I
> > don't want to put noise in the channel
> >
>
> Being a neophyte is cool - welcome! Whether this is the is the right group
> depends whether your questions relate to Jena specifics or not.. it seems
> to me they do, at least in part..
>
> Rob
>
>
> >
> > Thanks again!
> >
> > On Thu, Dec 11, 2014 at 12:20 PM, Rob Walpole <[email protected]>
> > wrote:
> >
> > > Hi Nate,
> > >
> > > I'm not sure what you mean by an "ontology management workflow" exactly
> > and
> > > I can't comment on whether your approach is a good one or not... but
> what
> > > we have done is to create our own ontology which as far as possible
> > reuses
> > > or extends other pre-existing ontologies (e.g. central-goverment,
> dublin
> > > core etc.). This ontology consists of a load of classes, object
> > properties
> > > and data properties which are used inside our actual data. The ontology
> > (or
> > > TBox - http://en.wikipedia.org/wiki/Tbox) and data (or ABox -
> > > http://en.wikipedia.org/wiki/Abox) components exist as separate
> datasets
> > > and we have found it convenient to store them as separate named graphs
> > > within our triplestore - mainly so that the ontology component can be
> > > updated easily by dropping and reloading the graph.
> > >
> > > We manage the ontology using Protege and I have to say I find modelling
> > > things in Protege saves me from wasting huge amounts of time as it
> forces
> > > me to model things up front before I start fiddling about with the
> data.
> > I
> > > find the OntoGraf plugin particularly helpful when I need to visualise
> > > relationships and when discussing requirements with users. Protege also
> > > allows you to save the ontology as an RDF file which you can load
> > straight
> > > into your triplestore (Jena TDB in our case).
> > >
> > > We also keep a number of named individuals in the ontology itself.
> These
> > > are for things that are entities but what I think of (coming from a
> Java
> > > background) as statics. They are the entities which are very unlikely
> to
> > > change and if they do then I am happy to edit them within the ontology.
> > >
> > > Hope that helps in some way.
> > >
> > > Rob
> > >
> > > Rob Walpole
> > > Email [email protected]
> > > Tel. +44 (0)7969 869881
> > > Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole
> > >
> > >
> > > On Thu, Dec 11, 2014 at 12:30 PM, Nate Marks <[email protected]>
> wrote:
> > >
> > > > I'm trying to get my arms around an ontology management workflow.
> I've
> > > > been reading the docs on the Apache Jena site  and a couple of books.
> >  I
> > > > was hoping to test my understanding of the technology by sharing my
> > > current
> > > > plan and gathering some feedback.
> > > >
> > > > Thanks in advance if you have the time to comment!
> > > >
> > > >
> > > > I intend to tightly manage a pretty broad ontology.  Let's say it
> > > includes
> > > > assets, locations, people and workflows.
> > > >
> > > > I think I want to have a single "schema" file that describes the
> asset
> > > > class hierarchy  and the rules for validating assets based on
> > properties,
> > > > disjointness etc.
> > > >
> > > > Then I might have a bunch of other "data" files that enumerate all
> the
> > > > assets using that first "schema"  file.
> > > >
> > > > I'd repeat this structure using a schema file each for locations,
> > people,
> > > > workflows.
> > > >
> > > > Having created these files, I think I can  use an assembler file to
> > pull
> > > > them into a single model.
> > > >
> > > > Ultimately, I expect to query the data using Fuseki and this is
> where I
> > > get
> > > > a little hazy.  I think the assembler can pull the files into a
> single
> > > > memory model, then I can write it to a tdb.
> > > >
> > > > Is that necessary, though?  it's a simple bit of java, but I have the
> > > > nagging feeling that there's a shorter path to automatically
> > > load/validate
> > > > those files for  Fuseki
> > > >
> > > >
> > > > Is this approach to organizing the files sound?
> > > >
> > >
> >
>

Re: Is this a good way to get started?

Reply via email to