Re: [CODE4LIB] rdf triplestores

2014-12-22 Thread Forrest, Stuart
Hi Jeff

So then are triple stores a means to an end that is just a vehicle for storing 
a type of data ie graph data? Like Access stores relational data?

On the path to learning this, what software would I install for experimenting?

Thanks

Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life





-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Mixter,Jeff
Sent: Friday, December 19, 2014 11:10 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Stuart,

Since triplestores, in essence, store graph data I think a slightly better 
question is what can you do with graph data (if you do not mind me rephrasing 
you question).

From this perspective I would point to Facebook or LinkedIn as prime examples 
of what can be done with graph data. Obviously those do not necessarily 
translate well into what can be done with library graph data but it does show 
the potential. For libraries, I think one of the benefits will be 
expanded/enhanced discoverability for resources. 

With graph data it is much easier to search for an author (lets say Jane 
Austen) and find not only all of the books that she authored but also all of 
the books about her, all of the books that are about similar topics, published 
in similar periods. One can then imaging hopping from the Jane Austen node on 
the graph to a node that is a book she wrote (say Pride and Prejudice) and then 
to a subject node for the book (say Social Classes--Fiction). From there you 
could then find all of the Authors that wrote books about that same topic and 
then navigate to those books.

Our current ILS systems try t o do this with MARC records but because they are 
mostly string based, it is very difficult to accurately provide this type of 
information to users. Graph data helps overcome this hurdle.

This was a rather basic example of how end-users can benefit from graph data 
but I think it is a compelling reason.

I have attached a simple image to help visualize what I was talking about. In 
it the user would start by finding Author1 and then using the graph we (the 
library) could suggest that they might like Book2 (since it is about the same 
subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that 
shared a common subject, Subject1, with the author, Author1, that was 
originally searched for. Again, this is very basic but would be rather 
difficult to do with a string base record system.

If you wanted to add complexity, you could start talking about discover of 
multi-lingual items for bilingual users (since graph data should be language 
neutral).

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:32 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Thanks Jeff

Interesting concept, can you give me any examples of their usage, what kinds of 
data etc.?

Thanks


Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Mixter,Jeff
Sent: Friday, December 19, 2014 10:20 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

A triplestore is basically a database backend for RDF triples. The major 
benefit is that it allows for SPARQL querying. You could imagine a triplestore 
as being the same thing as a relational database that can be queried with SQL.

The drawback that I have run into is that unless you have unlimited hardware, 
triplestores can run into scaling problems (when you are looking at hundreds of 
millions or billions of triples). This is a problem when you want to search for 
data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the 
string literals and the go out to the triplestore to query for the data.

If you are looking to use a triplestore it is important to distinguish between 
search and query.

Triplestore are really good for query but not so good for search. The basic 
problem with search is that is it mostly string based and this requires a 
regular expression query in SPARQL which is expensive from a hardware 
perspective.

There are a few triple stores that use a hybrid model. In particular Jena 
Fuseki (http://jena.apache.org/documentation/query/text-query.html)

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt

Re: [CODE4LIB] rdf triplestores

2014-12-22 Thread Sarah Weissman
Jeff (and Hugh): Thanks for the clarification.



 The broader issue of comparing a relational database to a triple store has
 to do with limitations of RDBs that are not present when using triples. By
 nature RBS are rigidly defined with set tables and properties. In RDF there
 are no real restrictions. Assuming you understand the model for the data in
 the graph, you can consume more data without crosswalking or converting it
 and then query against the data based on the model that it uses.


I can see how the lack of a fixed relationship scheme would be useful. I am
considering using a triple store for a project where we are annotating data
and we might want to expand the types of annotations we make over time
(e.g., semantic tags, machine learning results). And where the particular
way we refer to the data that is being annotated might depend on the type
of annotation (e.g. a data set ID, an image region).


The other major difference is that nodes in Graph data almost always (or at
 least should) have a persistent identifier. So 'Jane Austen', 'Austen,
 Jane', and '奥斯丁, 1775-1817' would all have the same identifier and
 consequently a query could be made using just that one identifier to find
 related entities. This is possible in a relational database but it would
 require a very well constructed table system and a extremely high level a
 maintenance and quality checking in order to keep it up to date.


I'm not sure the uniqueness of the persistent identifier is a big selling
point (to me) of a triple store. It's possible to do what you are saying in
a relational database, but it would be really bad design not to have a
primary key for your author table. It seems like the strong selling point
is to use the same set of persistent identifiers as someone else, so that
you are speaking the same language. Otherwise your unique ID for Jane
Austen is just as good as my unique ID in my relational authors table.

One of my concerns, apart from making the business case for triple stores
to an organization that is heavily invested in relational DB technology, is
that when I've experimented with importing RDF data that has come out of a
triple store into a relational DB, I have had issues with things like
relations pointing to persistent identifiers that don't exist in the
current namespace (perhaps this is a feature, not a bug?) and loops in
relationship graphs that shouldn't have loops. Relational DBs aren't any
good at finding loops, but you'd think a graph DB would be set up to detect
that kind of thing. This makes me wonder if the technology is really all
that mature.

-Sarah






 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah
 Weissman seweiss...@gmail.com
 Sent: Friday, December 19, 2014 2:05 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Jeff,


  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph to a node that is a book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors that
  wrote books about that same topic and then navigate to those books.
 
 
 When you say that it would be easier to discover these other relations
 from the Jane Austen node, do you mean that you can query for relations in
 a triplestore/graph DB more readily (efficiently?) than you can in a RDB?
 It seems like the equivalent in the RDB model would be, given a piece of
 data used in a FK column in a table, to query for (if you even could) what
 other tables use the same FK, then query these tables, constraining to the
 Jane Austen value to see whether or not they had any data, which is not a
 natural way of using a RDB.

 -Sarah


 On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

  Stuart,
 
  Since triplestores, in essence, store graph data I think a slightly
 better
  question is what can you do with graph data (if you do not mind me
  rephrasing you question).
 
  From this perspective I would point to Facebook or LinkedIn as prime
  examples of what can be done with graph data. Obviously those do not
  necessarily translate well into what can be done with library graph data
  but it does show the potential. For libraries, I think one of the
 benefits
  will be expanded/enhanced discoverability for resources.
 
  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph

Re: [CODE4LIB] rdf triplestores

2014-12-22 Thread Sarah Weissman
Jeff (and Hugh): Thanks for the clarification.



 The broader issue of comparing a relational database to a triple store has
 to do with limitations of RDBs that are not present when using triples. By
 nature RBS are rigidly defined with set tables and properties. In RDF there
 are no real restrictions. Assuming you understand the model for the data in
 the graph, you can consume more data without crosswalking or converting it
 and then query against the data based on the model that it uses.


I can see how the lack of a fixed relationship scheme would be useful. I am
considering using a triple store for a project where we are annotating data
and we might want to expand the types of annotations we make over time
(e.g., semantic tags, machine learning results). And where the particular
way we refer to the data that is being annotated might depend on the type
of annotation (e.g. a data set ID, an image region).


The other major difference is that nodes in Graph data almost always (or at
 least should) have a persistent identifier. So 'Jane Austen', 'Austen,
 Jane', and '奥斯丁, 1775-1817' would all have the same identifier and
 consequently a query could be made using just that one identifier to find
 related entities. This is possible in a relational database but it would
 require a very well constructed table system and a extremely high level a
 maintenance and quality checking in order to keep it up to date.


I'm not sure the uniqueness of the persistent identifier is a big selling
point (to me) of a triple store. It's possible to do what you are saying in
a relational database, but it would be really bad design not to have a
primary key for your author table. It seems like the strong selling point
is to use the same set of persistent identifiers as someone else, so that
you are speaking the same language. Otherwise your unique ID for Jane
Austen is just as good as my unique ID in my relational authors table.

One of my concerns, apart from making the business case for triple stores
to an organization that is heavily invested in relational DB technology, is
that when I've experimented with importing RDF data that has come out of a
triple store into a relational DB, I have had issues with things like
relations pointing to persistent identifiers that don't exist in the
current namespace (perhaps this is a feature, not a bug?) and loops in
relationship graphs that shouldn't have loops. Relational DBs aren't any
good at finding loops, but you'd think a graph DB would be set up to detect
that kind of thing. This makes me wonder if the technology is really all
that mature.

-Sarah






 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah
 Weissman seweiss...@gmail.com
 Sent: Friday, December 19, 2014 2:05 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Jeff,


  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph to a node that is a book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors that
  wrote books about that same topic and then navigate to those books.
 
 
 When you say that it would be easier to discover these other relations
 from the Jane Austen node, do you mean that you can query for relations in
 a triplestore/graph DB more readily (efficiently?) than you can in a RDB?
 It seems like the equivalent in the RDB model would be, given a piece of
 data used in a FK column in a table, to query for (if you even could) what
 other tables use the same FK, then query these tables, constraining to the
 Jane Austen value to see whether or not they had any data, which is not a
 natural way of using a RDB.

 -Sarah


 On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

  Stuart,
 
  Since triplestores, in essence, store graph data I think a slightly
 better
  question is what can you do with graph data (if you do not mind me
  rephrasing you question).
 
  From this perspective I would point to Facebook or LinkedIn as prime
  examples of what can be done with graph data. Obviously those do not
  necessarily translate well into what can be done with library graph data
  but it does show the potential. For libraries, I think one of the
 benefits
  will be expanded/enhanced discoverability for resources.
 
  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Forrest, Stuart
Hi All

My question is what do you guys use triplestores for?

Thanks
Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano 
Bargioni
Sent: Monday, November 11, 2013 8:53 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

My +1 for Joseki.
sb

On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

 What is your favorite RDF triplestore?
 
 I am able to convert numerous library-related metadata formats into RDF/XML. 
 In a minimal way, I can then contribute to the Semantic Web by simply putting 
 the resulting files on an HTTP file system. But if I were to import my 
 RDF/XML into a triplestore, then I could do a lot more. Jena seems like a 
 good option. So does Openlink Virtuoso. 
 
 What experience do y'all have with these tools, and do you know how to import 
 RDF/XML into them?
 
 -- 
 Eric Lease Morgan
 


Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Mixter,Jeff
A triplestore is basically a database backend for RDF triples. The major 
benefit is that it allows for SPARQL querying. You could imagine a triplestore 
as being the same thing as a relational database that can be queried with SQL.

The drawback that I have run into is that unless you have unlimited hardware, 
triplestores can run into scaling problems (when you are looking at hundreds of 
millions or billions of triples). This is a problem when you want to search for 
data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the 
string literals and the go out to the triplestore to query for the data.

If you are looking to use a triplestore it is important to distinguish between 
search and query.

Triplestore are really good for query but not so good for search. The basic 
problem with search is that is it mostly string based and this requires a 
regular expression query in SPARQL which is expensive from a hardware 
perspective. 

There are a few triple stores that use a hybrid model. In particular Jena 
Fuseki (http://jena.apache.org/documentation/query/text-query.html)

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Hi All

My question is what do you guys use triplestores for?

Thanks
Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano 
Bargioni
Sent: Monday, November 11, 2013 8:53 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

My +1 for Joseki.
sb

On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

 What is your favorite RDF triplestore?

 I am able to convert numerous library-related metadata formats into RDF/XML. 
 In a minimal way, I can then contribute to the Semantic Web by simply putting 
 the resulting files on an HTTP file system. But if I were to import my 
 RDF/XML into a triplestore, then I could do a lot more. Jena seems like a 
 good option. So does Openlink Virtuoso.

 What experience do y'all have with these tools, and do you know how to import 
 RDF/XML into them?

 --
 Eric Lease Morgan



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Ethan Gruber
I recently extended Fuseki to hook into a Solr index for geographic query
for one of our linked data projects, and I'm happy with the results so far.
It will open the door for us to build more sophisticated geographic
visualizations. I have not extended Fuseki for Lucene/Solr based full text
search, as we have a standalone Solr index for that, and a separate search
interface (for general users) from the SPARQL query interface (for advanced
ones).

It's definitely true that there are scaling limitations in SPARQL--just
look at how often dbpedia and the British Museum SPARQL endpoint go down.
Hardware is overcoming these limitations, but I still advocate a hybrid
approach: using Solr where it is advantageous to do so, and then build
focused user interfaces on top of SPARQL, leveraging the advantages of a
triplestore in contexts other than search. We open up our SPARQL endpoint
to the public, but by far more users interact with SPARQL through a HTML
interfaces in several different projects without having any idea that they
are doing so. We only have about a million triples in our triplestore (but
this is going to grow enormously in less than two years, I think, as the
floodgates are about to open in the world of ancient Greco-Roman coins),
but the system has only gone down for about 2 minutes in the last 2.5
years, on a virtual machine with only 4GB of memory.

Ethan

On Fri, Dec 19, 2014 at 10:20 AM, Mixter,Jeff mixt...@oclc.org wrote:

 A triplestore is basically a database backend for RDF triples. The major
 benefit is that it allows for SPARQL querying. You could imagine a
 triplestore as being the same thing as a relational database that can be
 queried with SQL.

 The drawback that I have run into is that unless you have unlimited
 hardware, triplestores can run into scaling problems (when you are looking
 at hundreds of millions or billions of triples). This is a problem when you
 want to search for data. For searching I use a hybrid Elasticsearch (i.e.
 Lucene) index for the string literals and the go out to the triplestore to
 query for the data.

 If you are looking to use a triplestore it is important to distinguish
 between search and query.

 Triplestore are really good for query but not so good for search. The
 basic problem with search is that is it mostly string based and this
 requires a regular expression query in SPARQL which is expensive from a
 hardware perspective.

 There are a few triple stores that use a hybrid model. In particular Jena
 Fuseki (http://jena.apache.org/documentation/query/text-query.html)

 Thanks,

 Jeff Mixter
 Research Support Specialist
 OCLC Research
 614-761-5159
 mixt...@oclc.org

 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest,
 Stuart sforr...@bcgov.net
 Sent: Friday, December 19, 2014 10:00 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Hi All

 My question is what do you guys use triplestores for?

 Thanks
 Stuart



 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Stefano Bargioni
 Sent: Monday, November 11, 2013 8:53 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 My +1 for Joseki.
 sb

 On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

  What is your favorite RDF triplestore?
 
  I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.
 
  What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Forrest, Stuart
Thanks Jeff

Interesting concept, can you give me any examples of their usage, what kinds of 
data etc.?

Thanks


Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Mixter,Jeff
Sent: Friday, December 19, 2014 10:20 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

A triplestore is basically a database backend for RDF triples. The major 
benefit is that it allows for SPARQL querying. You could imagine a triplestore 
as being the same thing as a relational database that can be queried with SQL.

The drawback that I have run into is that unless you have unlimited hardware, 
triplestores can run into scaling problems (when you are looking at hundreds of 
millions or billions of triples). This is a problem when you want to search for 
data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the 
string literals and the go out to the triplestore to query for the data.

If you are looking to use a triplestore it is important to distinguish between 
search and query.

Triplestore are really good for query but not so good for search. The basic 
problem with search is that is it mostly string based and this requires a 
regular expression query in SPARQL which is expensive from a hardware 
perspective. 

There are a few triple stores that use a hybrid model. In particular Jena 
Fuseki (http://jena.apache.org/documentation/query/text-query.html)

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Hi All

My question is what do you guys use triplestores for?

Thanks
Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano 
Bargioni
Sent: Monday, November 11, 2013 8:53 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

My +1 for Joseki.
sb

On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

 What is your favorite RDF triplestore?

 I am able to convert numerous library-related metadata formats into RDF/XML. 
 In a minimal way, I can then contribute to the Semantic Web by simply putting 
 the resulting files on an HTTP file system. But if I were to import my 
 RDF/XML into a triplestore, then I could do a lot more. Jena seems like a 
 good option. So does Openlink Virtuoso.

 What experience do y'all have with these tools, and do you know how to import 
 RDF/XML into them?

 --
 Eric Lease Morgan



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Tom Johnson
DPLA is working on moving to a more RDF-aware stack, including Marmotta[1]
as a triplestore, Linked Data Platform server, and Linked Data cache layer.

You can check out our data model[2], which we use as a common format for
special collections/archives/museum metadata aggregated from our partners.
Marmotta gives us RDF persistence with graph query via SPARQL, and a REST
interface via LDP[3].  Most/all of our actual interactions with the data
are mediated by ActiveTriples[4], an ORM-like interface to RDF resources.
From there, it's just like any other application, with the benefits (and
pitfalls) offered by a graph model, Open World, URIs, etc... becoming
tangible from time to time.

[1] http://marmotta.apache.org/
[2] http://dp.la/info/wp-content/uploads/2013/04/DPLA-MAP-V3.1-2.pdf
[3] http://www.w3.org/TR/ldp/
[4] https://github.com/ActiveTriples/ActiveTriples

On Fri, Dec 19, 2014 at 7:32 AM, Forrest, Stuart sforr...@bcgov.net wrote:

 Thanks Jeff

 Interesting concept, can you give me any examples of their usage, what
 kinds of data etc.?

 Thanks


 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life




 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Mixter,Jeff
 Sent: Friday, December 19, 2014 10:20 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 A triplestore is basically a database backend for RDF triples. The major
 benefit is that it allows for SPARQL querying. You could imagine a
 triplestore as being the same thing as a relational database that can be
 queried with SQL.

 The drawback that I have run into is that unless you have unlimited
 hardware, triplestores can run into scaling problems (when you are looking
 at hundreds of millions or billions of triples). This is a problem when you
 want to search for data. For searching I use a hybrid Elasticsearch (i.e.
 Lucene) index for the string literals and the go out to the triplestore to
 query for the data.

 If you are looking to use a triplestore it is important to distinguish
 between search and query.

 Triplestore are really good for query but not so good for search. The
 basic problem with search is that is it mostly string based and this
 requires a regular expression query in SPARQL which is expensive from a
 hardware perspective.

 There are a few triple stores that use a hybrid model. In particular Jena
 Fuseki (http://jena.apache.org/documentation/query/text-query.html)

 Thanks,

 Jeff Mixter
 Research Support Specialist
 OCLC Research
 614-761-5159
 mixt...@oclc.org

 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest,
 Stuart sforr...@bcgov.net
 Sent: Friday, December 19, 2014 10:00 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Hi All

 My question is what do you guys use triplestores for?

 Thanks
 Stuart



 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Stefano Bargioni
 Sent: Monday, November 11, 2013 8:53 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 My +1 for Joseki.
 sb

 On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

  What is your favorite RDF triplestore?
 
  I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.
 
  What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Mixter,Jeff
Stuart,

Since triplestores, in essence, store graph data I think a slightly better 
question is what can you do with graph data (if you do not mind me rephrasing 
you question).

From this perspective I would point to Facebook or LinkedIn as prime examples 
of what can be done with graph data. Obviously those do not necessarily 
translate well into what can be done with library graph data but it does show 
the potential. For libraries, I think one of the benefits will be 
expanded/enhanced discoverability for resources. 

With graph data it is much easier to search for an author (lets say Jane 
Austen) and find not only all of the books that she authored but also all of 
the books about her, all of the books that are about similar topics, published 
in similar periods. One can then imaging hopping from the Jane Austen node on 
the graph to a node that is a book she wrote (say Pride and Prejudice) and then 
to a subject node for the book (say Social Classes--Fiction). From there you 
could then find all of the Authors that wrote books about that same topic and 
then navigate to those books.

Our current ILS systems try t o do this with MARC records but because they are 
mostly string based, it is very difficult to accurately provide this type of 
information to users. Graph data helps overcome this hurdle.

This was a rather basic example of how end-users can benefit from graph data 
but I think it is a compelling reason.

I have attached a simple image to help visualize what I was talking about. In 
it the user would start by finding Author1 and then using the graph we (the 
library) could suggest that they might like Book2 (since it is about the same 
subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that 
shared a common subject, Subject1, with the author, Author1, that was 
originally searched for. Again, this is very basic but would be rather 
difficult to do with a string base record system.

If you wanted to add complexity, you could start talking about discover of 
multi-lingual items for bilingual users (since graph data should be language 
neutral).

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:32 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Thanks Jeff

Interesting concept, can you give me any examples of their usage, what kinds of 
data etc.?

Thanks


Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Mixter,Jeff
Sent: Friday, December 19, 2014 10:20 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

A triplestore is basically a database backend for RDF triples. The major 
benefit is that it allows for SPARQL querying. You could imagine a triplestore 
as being the same thing as a relational database that can be queried with SQL.

The drawback that I have run into is that unless you have unlimited hardware, 
triplestores can run into scaling problems (when you are looking at hundreds of 
millions or billions of triples). This is a problem when you want to search for 
data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the 
string literals and the go out to the triplestore to query for the data.

If you are looking to use a triplestore it is important to distinguish between 
search and query.

Triplestore are really good for query but not so good for search. The basic 
problem with search is that is it mostly string based and this requires a 
regular expression query in SPARQL which is expensive from a hardware 
perspective.

There are a few triple stores that use a hybrid model. In particular Jena 
Fuseki (http://jena.apache.org/documentation/query/text-query.html)

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Hi All

My question is what do you guys use triplestores for?

Thanks
Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano 
Bargioni
Sent: Monday, November

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Wilkens, Ann
Thank you.

On 12/19/14, 9:20 AM, Mixter,Jeff mixt...@oclc.org wrote:

A triplestore is basically a database backend for RDF triples. The major
benefit is that it allows for SPARQL querying. You could imagine a
triplestore as being the same thing as a relational database that can be
queried with SQL.

The drawback that I have run into is that unless you have unlimited
hardware, triplestores can run into scaling problems (when you are
looking at hundreds of millions or billions of triples). This is a
problem when you want to search for data. For searching I use a hybrid
Elasticsearch (i.e. Lucene) index for the string literals and the go out
to the triplestore to query for the data.

If you are looking to use a triplestore it is important to distinguish
between search and query.

Triplestore are really good for query but not so good for search. The
basic problem with search is that is it mostly string based and this
requires a regular expression query in SPARQL which is expensive from a
hardware perspective.

There are a few triple stores that use a hybrid model. In particular Jena
Fuseki (http://jena.apache.org/documentation/query/text-query.html)

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest,
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 10:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Hi All

My question is what do you guys use triplestores for?

Thanks
Stuart


==
==
Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Stefano Bargioni
Sent: Monday, November 11, 2013 8:53 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

My +1 for Joseki.
sb

On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

 What is your favorite RDF triplestore?

 I am able to convert numerous library-related metadata formats into
RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
simply putting the resulting files on an HTTP file system. But if I were
to import my RDF/XML into a triplestore, then I could do a lot more.
Jena seems like a good option. So does Openlink Virtuoso.

 What experience do y'all have with these tools, and do you know how to
import RDF/XML into them?

 --
 Eric Lease Morgan



Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Sarah Weissman
Jeff,


 With graph data it is much easier to search for an author (lets say Jane
 Austen) and find not only all of the books that she authored but also all
 of the books about her, all of the books that are about similar topics,
 published in similar periods. One can then imaging hopping from the Jane
 Austen node on the graph to a node that is a book she wrote (say Pride and
 Prejudice) and then to a subject node for the book (say Social
 Classes--Fiction). From there you could then find all of the Authors that
 wrote books about that same topic and then navigate to those books.


When you say that it would be easier to discover these other relations
from the Jane Austen node, do you mean that you can query for relations in
a triplestore/graph DB more readily (efficiently?) than you can in a RDB?
It seems like the equivalent in the RDB model would be, given a piece of
data used in a FK column in a table, to query for (if you even could) what
other tables use the same FK, then query these tables, constraining to the
Jane Austen value to see whether or not they had any data, which is not a
natural way of using a RDB.

-Sarah


On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

 Stuart,

 Since triplestores, in essence, store graph data I think a slightly better
 question is what can you do with graph data (if you do not mind me
 rephrasing you question).

 From this perspective I would point to Facebook or LinkedIn as prime
 examples of what can be done with graph data. Obviously those do not
 necessarily translate well into what can be done with library graph data
 but it does show the potential. For libraries, I think one of the benefits
 will be expanded/enhanced discoverability for resources.

 With graph data it is much easier to search for an author (lets say Jane
 Austen) and find not only all of the books that she authored but also all
 of the books about her, all of the books that are about similar topics,
 published in similar periods. One can then imaging hopping from the Jane
 Austen node on the graph to a node that is a book she wrote (say Pride and
 Prejudice) and then to a subject node for the book (say Social
 Classes--Fiction). From there you could then find all of the Authors that
 wrote books about that same topic and then navigate to those books.

 Our current ILS systems try t o do this with MARC records but because they
 are mostly string based, it is very difficult to accurately provide this
 type of information to users. Graph data helps overcome this hurdle.

 This was a rather basic example of how end-users can benefit from graph
 data but I think it is a compelling reason.

 I have attached a simple image to help visualize what I was talking about.
 In it the user would start by finding Author1 and then using the graph we
 (the library) could suggest that they might like Book2 (since it is about
 the same subject) or even Book3 (since it is by Author2 who wrote a book,
 Book2, that shared a common subject, Subject1, with the author, Author1,
 that was originally searched for. Again, this is very basic but would be
 rather difficult to do with a string base record system.

 If you wanted to add complexity, you could start talking about discover of
 multi-lingual items for bilingual users (since graph data should be
 language neutral).

 Thanks,

 Jeff Mixter
 Research Support Specialist
 OCLC Research
 614-761-5159
 mixt...@oclc.org

 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest,
 Stuart sforr...@bcgov.net
 Sent: Friday, December 19, 2014 10:32 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 Thanks Jeff

 Interesting concept, can you give me any examples of their usage, what
 kinds of data etc.?

 Thanks


 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life




 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Mixter,Jeff
 Sent: Friday, December 19, 2014 10:20 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 A triplestore is basically a database backend for RDF triples. The major
 benefit is that it allows for SPARQL querying. You could imagine a
 triplestore as being the same thing as a relational database that can be
 queried with SQL.

 The drawback that I have run into is that unless you have unlimited
 hardware, triplestores can run into scaling problems (when you are looking
 at hundreds of millions or billions of triples). This is a problem when you
 want to search for data. For searching I use a hybrid Elasticsearch (i.e.
 Lucene) index for the string literals and the go out to the triplestore to
 query for the data.

 If you are looking to use a triplestore

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Hugh Cayless
That's pretty much it. There are operations that are completely natural to
a graph db that require either many joins or multiple queries to achieve
with an RDBMS.

It all depends very much on what sorts of data you're dealing with, and how
you want to model that data. Graph databases can certainly be faster/more
efficient at querying data with many joins than RDBMSs are. It's
(unsurprisingly) easier to model data that looks like a web of
relationships in a graph db than in an RDBMS. They're not so great at
dealing with regular, record-shaped data on the other hand, nor
document-shaped data for that matter.

They are a very useful tool to have in your kit and using them will likely
change the way you think about data modeling in a good way.

On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com
wrote:

 Jeff,


  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph to a node that is a book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors that
  wrote books about that same topic and then navigate to those books.
 
 
 When you say that it would be easier to discover these other relations
 from the Jane Austen node, do you mean that you can query for relations in
 a triplestore/graph DB more readily (efficiently?) than you can in a RDB?
 It seems like the equivalent in the RDB model would be, given a piece of
 data used in a FK column in a table, to query for (if you even could) what
 other tables use the same FK, then query these tables, constraining to the
 Jane Austen value to see whether or not they had any data, which is not a
 natural way of using a RDB.

 -Sarah


 On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

  Stuart,
 
  Since triplestores, in essence, store graph data I think a slightly
 better
  question is what can you do with graph data (if you do not mind me
  rephrasing you question).
 
  From this perspective I would point to Facebook or LinkedIn as prime
  examples of what can be done with graph data. Obviously those do not
  necessarily translate well into what can be done with library graph data
  but it does show the potential. For libraries, I think one of the
 benefits
  will be expanded/enhanced discoverability for resources.
 
  With graph data it is much easier to search for an author (lets say Jane
  Austen) and find not only all of the books that she authored but also all
  of the books about her, all of the books that are about similar topics,
  published in similar periods. One can then imaging hopping from the Jane
  Austen node on the graph to a node that is a book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors that
  wrote books about that same topic and then navigate to those books.
 
  Our current ILS systems try t o do this with MARC records but because
 they
  are mostly string based, it is very difficult to accurately provide this
  type of information to users. Graph data helps overcome this hurdle.
 
  This was a rather basic example of how end-users can benefit from graph
  data but I think it is a compelling reason.
 
  I have attached a simple image to help visualize what I was talking
 about.
  In it the user would start by finding Author1 and then using the graph we
  (the library) could suggest that they might like Book2 (since it is about
  the same subject) or even Book3 (since it is by Author2 who wrote a book,
  Book2, that shared a common subject, Subject1, with the author, Author1,
  that was originally searched for. Again, this is very basic but would be
  rather difficult to do with a string base record system.
 
  If you wanted to add complexity, you could start talking about discover
 of
  multi-lingual items for bilingual users (since graph data should be
  language neutral).
 
  Thanks,
 
  Jeff Mixter
  Research Support Specialist
  OCLC Research
  614-761-5159
  mixt...@oclc.org
 
  
  From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of
 Forrest,
  Stuart sforr...@bcgov.net
  Sent: Friday, December 19, 2014 10:32 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] rdf triplestores
 
  Thanks Jeff
 
  Interesting concept, can you give me any examples of their usage, what
  kinds of data etc.?
 
  Thanks
 
 
 
 
  Stuart Forrest PhD
  Library Systems Specialist
  Beaufort County Library
  843 255 6450
  sforr...@bcgov.net
 
  http://www.beaufortcountylibrary.org
 
  For Leisure

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Mixter,Jeff
Sarah,

I should have probably chosen a different word, or at least explained it 
better. One of the advantages that SPARQL has over SQL is simplicity of the 
syntax. There are many simple SPARQL queries that in SQL would require multiple 
outer joins. This simplicity does not necessarily relate to efficiency but it 
is worth noting.

The broader issue of comparing a relational database to a triple store has to 
do with limitations of RDBs that are not present when using triples. By nature 
RBS are rigidly defined with set tables and properties. In RDF there are no 
real restrictions. Assuming you understand the model for the data in the graph, 
you can consume more data without crosswalking or converting it and then query 
against the data based on the model that it uses. 

So with SQL, the structure of database defines the queries you can make. 
Conversely, with SPARQL the queries are defined by the data that is in the 
triplestore (i.e. the database is agnostic).

The other major difference is that nodes in Graph data almost always (or at 
least should) have a persistent identifier. So 'Jane Austen', 'Austen, Jane', 
and '奥斯丁, 1775-1817' would all have the same identifier and consequently a 
query could be made using just that one identifier to find related entities. 
This is possible in a relational database but it would require a very well 
constructed table system and a extremely high level a maintenance and quality 
checking in order to keep it up to date. Most of the relational databases I 
have worked with either  use strings for this type of search or if identifiers 
are present, they are generated based on the unique string in a certain field 
(such as Name). The later of these scenarios work well if you only have one 
language but as soon as you start dealing with multi-lingual data or dirty data 
( i.e. Jan Austin instead of Jane Austen) you run into problems. 

I am not sure if this answered your question or not.

Here are some resources on SPARQL queries that I have used in the past:

http://www.cambridgesemantics.com/semantic-university/sparql-vs-sql-intro

http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/Bizer-Schultz-Berlin-SPARQL-Benchmark-IJSWIS.pdf

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah Weissman 
seweiss...@gmail.com
Sent: Friday, December 19, 2014 2:05 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Jeff,


 With graph data it is much easier to search for an author (lets say Jane
 Austen) and find not only all of the books that she authored but also all
 of the books about her, all of the books that are about similar topics,
 published in similar periods. One can then imaging hopping from the Jane
 Austen node on the graph to a node that is a book she wrote (say Pride and
 Prejudice) and then to a subject node for the book (say Social
 Classes--Fiction). From there you could then find all of the Authors that
 wrote books about that same topic and then navigate to those books.


When you say that it would be easier to discover these other relations
from the Jane Austen node, do you mean that you can query for relations in
a triplestore/graph DB more readily (efficiently?) than you can in a RDB?
It seems like the equivalent in the RDB model would be, given a piece of
data used in a FK column in a table, to query for (if you even could) what
other tables use the same FK, then query these tables, constraining to the
Jane Austen value to see whether or not they had any data, which is not a
natural way of using a RDB.

-Sarah


On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

 Stuart,

 Since triplestores, in essence, store graph data I think a slightly better
 question is what can you do with graph data (if you do not mind me
 rephrasing you question).

 From this perspective I would point to Facebook or LinkedIn as prime
 examples of what can be done with graph data. Obviously those do not
 necessarily translate well into what can be done with library graph data
 but it does show the potential. For libraries, I think one of the benefits
 will be expanded/enhanced discoverability for resources.

 With graph data it is much easier to search for an author (lets say Jane
 Austen) and find not only all of the books that she authored but also all
 of the books about her, all of the books that are about similar topics,
 published in similar periods. One can then imaging hopping from the Jane
 Austen node on the graph to a node that is a book she wrote (say Pride and
 Prejudice) and then to a subject node for the book (say Social
 Classes--Fiction). From there you could then find all of the Authors that
 wrote books about that same topic and then navigate to those books.

 Our current ILS systems try t o do this with MARC records but because they
 are mostly string based

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Mixter,Jeff
Stuart,

This presentation was given at the Code4Lib conference in 2009. It is a good 
starting point.

http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data

I will dig around and try to find some other presentations or 
documents/articles that could be a used for introductory purposes.

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 2:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

This all sounds really interesting, can anyone recommend a resource for 
learning what it's all about and what it can be used for?

Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Hugh 
Cayless
Sent: Friday, December 19, 2014 2:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

That's pretty much it. There are operations that are completely natural to a 
graph db that require either many joins or multiple queries to achieve with an 
RDBMS.

It all depends very much on what sorts of data you're dealing with, and how you 
want to model that data. Graph databases can certainly be faster/more efficient 
at querying data with many joins than RDBMSs are. It's
(unsurprisingly) easier to model data that looks like a web of relationships in 
a graph db than in an RDBMS. They're not so great at dealing with regular, 
record-shaped data on the other hand, nor document-shaped data for that matter.

They are a very useful tool to have in your kit and using them will likely 
change the way you think about data modeling in a good way.

On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com
wrote:

 Jeff,


  With graph data it is much easier to search for an author (lets say
  Jane
  Austen) and find not only all of the books that she authored but
  also all of the books about her, all of the books that are about
  similar topics, published in similar periods. One can then imaging
  hopping from the Jane Austen node on the graph to a node that is a
  book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors
  that wrote books about that same topic and then navigate to those books.
 
 
 When you say that it would be easier to discover these other
 relations from the Jane Austen node, do you mean that you can query
 for relations in a triplestore/graph DB more readily (efficiently?) than you 
 can in a RDB?
 It seems like the equivalent in the RDB model would be, given a piece
 of data used in a FK column in a table, to query for (if you even
 could) what other tables use the same FK, then query these tables,
 constraining to the Jane Austen value to see whether or not they had
 any data, which is not a natural way of using a RDB.

 -Sarah


 On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

  Stuart,
 
  Since triplestores, in essence, store graph data I think a slightly
 better
  question is what can you do with graph data (if you do not mind me
  rephrasing you question).
 
  From this perspective I would point to Facebook or LinkedIn as prime
  examples of what can be done with graph data. Obviously those do not
  necessarily translate well into what can be done with library graph
  data but it does show the potential. For libraries, I think one of
  the
 benefits
  will be expanded/enhanced discoverability for resources.
 
  With graph data it is much easier to search for an author (lets say
  Jane
  Austen) and find not only all of the books that she authored but
  also all of the books about her, all of the books that are about
  similar topics, published in similar periods. One can then imaging
  hopping from the Jane Austen node on the graph to a node that is a
  book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social
  Classes--Fiction). From there you could then find all of the Authors
  that wrote books about that same topic and then navigate to those books.
 
  Our current ILS systems try t o do this with MARC records but
  because
 they
  are mostly string based, it is very difficult to accurately provide
  this type of information to users. Graph data helps overcome this hurdle.
 
  This was a rather basic example of how end-users can benefit from
  graph data but I think it is a compelling reason.
 
  I have attached a simple image to help visualize what I was talking
 about.
  In it the user would start by finding Author1 and then using

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Forrest, Stuart
Jeff

Thanks I appreciate it.

Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Mixter,Jeff
Sent: Friday, December 19, 2014 2:45 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

Stuart,

This presentation was given at the Code4Lib conference in 2009. It is a good 
starting point.

http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data

I will dig around and try to find some other presentations or 
documents/articles that could be a used for introductory purposes.

Thanks,

Jeff Mixter
Research Support Specialist
OCLC Research
614-761-5159
mixt...@oclc.org


From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, 
Stuart sforr...@bcgov.net
Sent: Friday, December 19, 2014 2:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

This all sounds really interesting, can anyone recommend a resource for 
learning what it's all about and what it can be used for?

Stuart



Stuart Forrest PhD
Library Systems Specialist
Beaufort County Library
843 255 6450
sforr...@bcgov.net

http://www.beaufortcountylibrary.org

For Leisure, For Learning, For Life




-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Hugh 
Cayless
Sent: Friday, December 19, 2014 2:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] rdf triplestores

That's pretty much it. There are operations that are completely natural to a 
graph db that require either many joins or multiple queries to achieve with an 
RDBMS.

It all depends very much on what sorts of data you're dealing with, and how you 
want to model that data. Graph databases can certainly be faster/more efficient 
at querying data with many joins than RDBMSs are. It's
(unsurprisingly) easier to model data that looks like a web of relationships in 
a graph db than in an RDBMS. They're not so great at dealing with regular, 
record-shaped data on the other hand, nor document-shaped data for that matter.

They are a very useful tool to have in your kit and using them will likely 
change the way you think about data modeling in a good way.

On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com
wrote:

 Jeff,


  With graph data it is much easier to search for an author (lets say 
  Jane
  Austen) and find not only all of the books that she authored but 
  also all of the books about her, all of the books that are about 
  similar topics, published in similar periods. One can then imaging 
  hopping from the Jane Austen node on the graph to a node that is a 
  book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social 
  Classes--Fiction). From there you could then find all of the Authors 
  that wrote books about that same topic and then navigate to those books.
 
 
 When you say that it would be easier to discover these other 
 relations from the Jane Austen node, do you mean that you can query 
 for relations in a triplestore/graph DB more readily (efficiently?) than you 
 can in a RDB?
 It seems like the equivalent in the RDB model would be, given a piece 
 of data used in a FK column in a table, to query for (if you even
 could) what other tables use the same FK, then query these tables, 
 constraining to the Jane Austen value to see whether or not they had 
 any data, which is not a natural way of using a RDB.

 -Sarah


 On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote:

  Stuart,
 
  Since triplestores, in essence, store graph data I think a slightly
 better
  question is what can you do with graph data (if you do not mind me 
  rephrasing you question).
 
  From this perspective I would point to Facebook or LinkedIn as prime 
  examples of what can be done with graph data. Obviously those do not 
  necessarily translate well into what can be done with library graph 
  data but it does show the potential. For libraries, I think one of 
  the
 benefits
  will be expanded/enhanced discoverability for resources.
 
  With graph data it is much easier to search for an author (lets say 
  Jane
  Austen) and find not only all of the books that she authored but 
  also all of the books about her, all of the books that are about 
  similar topics, published in similar periods. One can then imaging 
  hopping from the Jane Austen node on the graph to a node that is a 
  book she wrote (say Pride
 and
  Prejudice) and then to a subject node for the book (say Social 
  Classes--Fiction). From there you could then find all of the Authors 
  that wrote books

Re: [CODE4LIB] rdf triplestores

2014-12-19 Thread Conal Tuohy
One thing I've been using a triple store for recently is to model a
lexicographic dataset extracted from a bunch of TEI files. The TEI XML
files are transcriptions of lexicons of various Australian aboriginal
languages; tables of English language words, with their equivalents
supplied by native speakers of those languages, in outback Australia in the
early 20th C.

For this aboriginal language project I wrote XSLT that converts one of
these TEI files into an RDF/XML file in which the lexicographic data in the
TEI is encoded in SKOS (a thesaurus vocabulary). I apply that stylesheet to
each TEI file, and take the resulting RDF/XML file and store it in the RDF
graph store with an HTTP PUT. Then I wrote SPARQL queries to query over the
union of all those graphs, to extract statistics and analyze the full
dataset.

Using a triple store and a SPARQL query interface makes it much easier and
more efficient to query the lexicographic data than it would be to query it
directly from the TEI XML, using e.g. XQuery,

For my triple store I chose to use Apache Fuseki, because it implements all
the SPARQL 1.1 protocols including the Graph Store HTTP Protocol 
http://www.w3.org/TR/sparql11-http-rdf-update/. The crucial thing with the
SPARQL 1.1 HTTP Graph Store protocol is that your unit of data management
is not at the level of individual triples, but at the level of groups of
triples - Named Graphs - which are very much the same as the concept of a
record in traditional data management systems. So although it's possible
to use the older SPARQL Update Protocol to manage your RDF data, I think
it's generally much easier to use the SPARQL Graph Store HTTP Protocol
interface to keep the RDF up to date and in synch with the source data.

In the SPARQL Update Protocol, you send the SPARQL server a command that
inserts and/or deletes triples; so it's a kind of Remote Procedure Call
style of protocol. Whereas the Graph Store protocol is resource-oriented
(RESTful); you simply identify a bunch of triples (a Named Graph), and
use HTTP PUT to overwrite them with a new bunch of triples, or DELETE to
remove them altogether, or POST to add new triples to the graph.








On 20 December 2014 at 01:00, Forrest, Stuart sforr...@bcgov.net wrote:

 Hi All

 My question is what do you guys use triplestores for?

 Thanks
 Stuart



 
 Stuart Forrest PhD
 Library Systems Specialist
 Beaufort County Library
 843 255 6450
 sforr...@bcgov.net

 http://www.beaufortcountylibrary.org

 For Leisure, For Learning, For Life



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Stefano Bargioni
 Sent: Monday, November 11, 2013 8:53 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] rdf triplestores

 My +1 for Joseki.
 sb

 On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

  What is your favorite RDF triplestore?
 
  I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.
 
  What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Ethan Gruber
I've been using Apache Fuseki (
http://jena.apache.org/documentation/serving_data/) for almost a year, in
production since the spring.  It's a SPARQL server with a built in TBD.
It's easy to use, and takes about 5 minutes to get working on your desktop
or server.

Ethan


On Mon, Nov 11, 2013 at 1:17 AM, Richard Wallis 
richard.wal...@dataliberate.com wrote:

 I've had some success with 4Store: http://4store.org

 Used it on mac laptop to load the WorldCat most highly held resources:
 http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/

 As to the point about loading RDF/XML, especially if you have a large
 amount of data.

- Triplestores much prefer raw triples for large amounts of data
- Chopping up files of triples into smaller chunks is also often
beneficial as it reduces memory footprints and can take advantage of
multithreading.  It is also far easier to recover from errors such as
 bad
data etc.
- A bit of unix command line wizardry (split followed a simple for-loop)
is fairly standard practice

 Also raw triples are often easier to produce - none of that mucking about
 producing correctly formatted XML - and you can chop, sort, and play about
 with them using powerful unix command line tools.

 ~Richard.


 On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.org
 wrote:

  I've primarily used Sesame myself.  The http based queries made it pretty
  easy to script against.
 
  http://www.openrdf.org/
 
 
  On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu
  wrote:
 
   What is your favorite RDF triplestore?
  
   I am able to convert numerous library-related metadata formats into
   RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
   simply putting the resulting files on an HTTP file system. But if I
 were
  to
   import my RDF/XML into a triplestore, then I could do a lot more. Jena
   seems like a good option. So does Openlink Virtuoso.
  
   What experience do y'all have with these tools, and do you know how to
   import RDF/XML into them?
  
   --
   Eric Lease Morgan
  
 
 
 
  --
  *Scott Turnbull*
  APTrust Technical Lead
  scott.turnb...@aptrust.org
  www.aptrust.org
  678-379-9488
 



 --
 Richard Wallis
 Founder, Data Liberate
 http://dataliberate.com
 Tel: +44 (0)7767 886 005

 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw



Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Stefano Bargioni
My +1 for Joseki.
sb

On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

 What is your favorite RDF triplestore?
 
 I am able to convert numerous library-related metadata formats into RDF/XML. 
 In a minimal way, I can then contribute to the Semantic Web by simply putting 
 the resulting files on an HTTP file system. But if I were to import my 
 RDF/XML into a triplestore, then I could do a lot more. Jena seems like a 
 good option. So does Openlink Virtuoso. 
 
 What experience do y'all have with these tools, and do you know how to import 
 RDF/XML into them?
 
 -- 
 Eric Lease Morgan
 


Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Ross Singer
I've used Fuseki a lot and really like it, although configuration for
things like LARQ (full text indexing) historically has been a little
underdocumented (and it can be a little difficult to understand what
component is in charge of what task).

4-Store is super simple to get up and running with, as well, but I haven't
used it in production for anything.

-Ross.


On Mon, Nov 11, 2013 at 8:52 AM, Stefano Bargioni bargi...@pusc.it wrote:

 My +1 for Joseki.
 sb

 On 11/nov/2013, at 06.12, Eric Lease Morgan wrote:

  What is your favorite RDF triplestore?
 
  I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.
 
  What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Kevin Ford
I'll second Richard on this.   4store is fairly quick to set up and get 
going.  It comes with command-line tools and an HTTP option.


FWIW, ID.LOC.GOV uses 4store in its stack.

Yours,
Kevin


On 11/11/2013 01:17 AM, Richard Wallis wrote:

I've had some success with 4Store: http://4store.org

Used it on mac laptop to load the WorldCat most highly held resources:
http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/

As to the point about loading RDF/XML, especially if you have a large
amount of data.

- Triplestores much prefer raw triples for large amounts of data
- Chopping up files of triples into smaller chunks is also often
beneficial as it reduces memory footprints and can take advantage of
multithreading.  It is also far easier to recover from errors such as bad
data etc.
- A bit of unix command line wizardry (split followed a simple for-loop)
is fairly standard practice

Also raw triples are often easier to produce - none of that mucking about
producing correctly formatted XML - and you can chop, sort, and play about
with them using powerful unix command line tools.

~Richard.


On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.orgwrote:


I've primarily used Sesame myself.  The http based queries made it pretty
easy to script against.

http://www.openrdf.org/


On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu
wrote:


What is your favorite RDF triplestore?

I am able to convert numerous library-related metadata formats into
RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
simply putting the resulting files on an HTTP file system. But if I were

to

import my RDF/XML into a triplestore, then I could do a lot more. Jena
seems like a good option. So does Openlink Virtuoso.

What experience do y'all have with these tools, and do you know how to
import RDF/XML into them?

--
Eric Lease Morgan





--
*Scott Turnbull*
APTrust Technical Lead
scott.turnb...@aptrust.org
www.aptrust.org
678-379-9488







Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Jason Stirnaman
Eric,
We just did a workshop at C4LMidwest on getting up and running with Fuseki and 
RDF/XML. Here's the 3-part tutorial (for OS X, but translates easily to Linux):
http://jstirnaman.wordpress.com/2013/10/11/installing-fuseki-with-jena-and-tdb-on-os-x/

Jason

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Sunday, November 10, 2013 11:12 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] rdf triplestores

What is your favorite RDF triplestore?

I am able to convert numerous library-related metadata formats into RDF/XML. In 
a minimal way, I can then contribute to the Semantic Web by simply putting the 
resulting files on an HTTP file system. But if I were to import my RDF/XML into 
a triplestore, then I could do a lot more. Jena seems like a good option. So 
does Openlink Virtuoso. 

What experience do y'all have with these tools, and do you know how to import 
RDF/XML into them?

-- 
Eric Lease Morgan


Re: [CODE4LIB] rdf triplestores

2013-11-11 Thread Tom Johnson
We use 4Store at Oregon State University. I recommend it as very easy to
put up.

I've gone so far as to launch it live in a 20 minute talk.

- Tom


On Mon, Nov 11, 2013 at 8:52 AM, Kevin Ford k...@3windmills.com wrote:

 I'll second Richard on this.   4store is fairly quick to set up and get
 going.  It comes with command-line tools and an HTTP option.

 FWIW, ID.LOC.GOV uses 4store in its stack.

 Yours,
 Kevin



 On 11/11/2013 01:17 AM, Richard Wallis wrote:

 I've had some success with 4Store: http://4store.org

 Used it on mac laptop to load the WorldCat most highly held resources:
 http://dataliberate.com/2012/08/putting-worldcat-data-into-
 a-triple-store/

 As to the point about loading RDF/XML, especially if you have a large
 amount of data.

 - Triplestores much prefer raw triples for large amounts of data
 - Chopping up files of triples into smaller chunks is also often
 beneficial as it reduces memory footprints and can take advantage of
 multithreading.  It is also far easier to recover from errors such as
 bad
 data etc.
 - A bit of unix command line wizardry (split followed a simple
 for-loop)
 is fairly standard practice

 Also raw triples are often easier to produce - none of that mucking about
 producing correctly formatted XML - and you can chop, sort, and play about
 with them using powerful unix command line tools.

 ~Richard.


 On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.org
 wrote:

  I've primarily used Sesame myself.  The http based queries made it pretty
 easy to script against.

 http://www.openrdf.org/


 On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu
 wrote:

  What is your favorite RDF triplestore?

 I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were

 to

 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.

 What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?

 --
 Eric Lease Morgan




 --
 *Scott Turnbull*
 APTrust Technical Lead
 scott.turnb...@aptrust.org
 www.aptrust.org
 678-379-9488







Re: [CODE4LIB] rdf triplestores

2013-11-10 Thread Scott Turnbull
I've primarily used Sesame myself.  The http based queries made it pretty
easy to script against.

http://www.openrdf.org/


On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu wrote:

 What is your favorite RDF triplestore?

 I am able to convert numerous library-related metadata formats into
 RDF/XML. In a minimal way, I can then contribute to the Semantic Web by
 simply putting the resulting files on an HTTP file system. But if I were to
 import my RDF/XML into a triplestore, then I could do a lot more. Jena
 seems like a good option. So does Openlink Virtuoso.

 What experience do y'all have with these tools, and do you know how to
 import RDF/XML into them?

 --
 Eric Lease Morgan




-- 
*Scott Turnbull*
APTrust Technical Lead
scott.turnb...@aptrust.org
www.aptrust.org
678-379-9488