Re: [CODE4LIB] rdf triplestores
Hi Jeff So then are triple stores a means to an end that is just a vehicle for storing a type of data ie graph data? Like Access stores relational data? On the path to learning this, what software would I install for experimenting? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 11:10 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using the graph we (the library) could suggest that they might like Book2 (since it is about the same subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that shared a common subject, Subject1, with the author, Author1, that was originally searched for. Again, this is very basic but would be rather difficult to do with a string base record system. If you wanted to add complexity, you could start talking about discover of multi-lingual items for bilingual users (since graph data should be language neutral). Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt
Re: [CODE4LIB] rdf triplestores
Jeff (and Hugh): Thanks for the clarification. The broader issue of comparing a relational database to a triple store has to do with limitations of RDBs that are not present when using triples. By nature RBS are rigidly defined with set tables and properties. In RDF there are no real restrictions. Assuming you understand the model for the data in the graph, you can consume more data without crosswalking or converting it and then query against the data based on the model that it uses. I can see how the lack of a fixed relationship scheme would be useful. I am considering using a triple store for a project where we are annotating data and we might want to expand the types of annotations we make over time (e.g., semantic tags, machine learning results). And where the particular way we refer to the data that is being annotated might depend on the type of annotation (e.g. a data set ID, an image region). The other major difference is that nodes in Graph data almost always (or at least should) have a persistent identifier. So 'Jane Austen', 'Austen, Jane', and '奥斯丁, 1775-1817' would all have the same identifier and consequently a query could be made using just that one identifier to find related entities. This is possible in a relational database but it would require a very well constructed table system and a extremely high level a maintenance and quality checking in order to keep it up to date. I'm not sure the uniqueness of the persistent identifier is a big selling point (to me) of a triple store. It's possible to do what you are saying in a relational database, but it would be really bad design not to have a primary key for your author table. It seems like the strong selling point is to use the same set of persistent identifiers as someone else, so that you are speaking the same language. Otherwise your unique ID for Jane Austen is just as good as my unique ID in my relational authors table. One of my concerns, apart from making the business case for triple stores to an organization that is heavily invested in relational DB technology, is that when I've experimented with importing RDF data that has come out of a triple store into a relational DB, I have had issues with things like relations pointing to persistent identifiers that don't exist in the current namespace (perhaps this is a feature, not a bug?) and loops in relationship graphs that shouldn't have loops. Relational DBs aren't any good at finding loops, but you'd think a graph DB would be set up to detect that kind of thing. This makes me wonder if the technology is really all that mature. -Sarah From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah Weissman seweiss...@gmail.com Sent: Friday, December 19, 2014 2:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph
Re: [CODE4LIB] rdf triplestores
Jeff (and Hugh): Thanks for the clarification. The broader issue of comparing a relational database to a triple store has to do with limitations of RDBs that are not present when using triples. By nature RBS are rigidly defined with set tables and properties. In RDF there are no real restrictions. Assuming you understand the model for the data in the graph, you can consume more data without crosswalking or converting it and then query against the data based on the model that it uses. I can see how the lack of a fixed relationship scheme would be useful. I am considering using a triple store for a project where we are annotating data and we might want to expand the types of annotations we make over time (e.g., semantic tags, machine learning results). And where the particular way we refer to the data that is being annotated might depend on the type of annotation (e.g. a data set ID, an image region). The other major difference is that nodes in Graph data almost always (or at least should) have a persistent identifier. So 'Jane Austen', 'Austen, Jane', and '奥斯丁, 1775-1817' would all have the same identifier and consequently a query could be made using just that one identifier to find related entities. This is possible in a relational database but it would require a very well constructed table system and a extremely high level a maintenance and quality checking in order to keep it up to date. I'm not sure the uniqueness of the persistent identifier is a big selling point (to me) of a triple store. It's possible to do what you are saying in a relational database, but it would be really bad design not to have a primary key for your author table. It seems like the strong selling point is to use the same set of persistent identifiers as someone else, so that you are speaking the same language. Otherwise your unique ID for Jane Austen is just as good as my unique ID in my relational authors table. One of my concerns, apart from making the business case for triple stores to an organization that is heavily invested in relational DB technology, is that when I've experimented with importing RDF data that has come out of a triple store into a relational DB, I have had issues with things like relations pointing to persistent identifiers that don't exist in the current namespace (perhaps this is a feature, not a bug?) and loops in relationship graphs that shouldn't have loops. Relational DBs aren't any good at finding loops, but you'd think a graph DB would be set up to detect that kind of thing. This makes me wonder if the technology is really all that mature. -Sarah From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah Weissman seweiss...@gmail.com Sent: Friday, December 19, 2014 2:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph
Re: [CODE4LIB] rdf triplestores
Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
I recently extended Fuseki to hook into a Solr index for geographic query for one of our linked data projects, and I'm happy with the results so far. It will open the door for us to build more sophisticated geographic visualizations. I have not extended Fuseki for Lucene/Solr based full text search, as we have a standalone Solr index for that, and a separate search interface (for general users) from the SPARQL query interface (for advanced ones). It's definitely true that there are scaling limitations in SPARQL--just look at how often dbpedia and the British Museum SPARQL endpoint go down. Hardware is overcoming these limitations, but I still advocate a hybrid approach: using Solr where it is advantageous to do so, and then build focused user interfaces on top of SPARQL, leveraging the advantages of a triplestore in contexts other than search. We open up our SPARQL endpoint to the public, but by far more users interact with SPARQL through a HTML interfaces in several different projects without having any idea that they are doing so. We only have about a million triples in our triplestore (but this is going to grow enormously in less than two years, I think, as the floodgates are about to open in the world of ancient Greco-Roman coins), but the system has only gone down for about 2 minutes in the last 2.5 years, on a virtual machine with only 4GB of memory. Ethan On Fri, Dec 19, 2014 at 10:20 AM, Mixter,Jeff mixt...@oclc.org wrote: A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
DPLA is working on moving to a more RDF-aware stack, including Marmotta[1] as a triplestore, Linked Data Platform server, and Linked Data cache layer. You can check out our data model[2], which we use as a common format for special collections/archives/museum metadata aggregated from our partners. Marmotta gives us RDF persistence with graph query via SPARQL, and a REST interface via LDP[3]. Most/all of our actual interactions with the data are mediated by ActiveTriples[4], an ORM-like interface to RDF resources. From there, it's just like any other application, with the benefits (and pitfalls) offered by a graph model, Open World, URIs, etc... becoming tangible from time to time. [1] http://marmotta.apache.org/ [2] http://dp.la/info/wp-content/uploads/2013/04/DPLA-MAP-V3.1-2.pdf [3] http://www.w3.org/TR/ldp/ [4] https://github.com/ActiveTriples/ActiveTriples On Fri, Dec 19, 2014 at 7:32 AM, Forrest, Stuart sforr...@bcgov.net wrote: Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using the graph we (the library) could suggest that they might like Book2 (since it is about the same subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that shared a common subject, Subject1, with the author, Author1, that was originally searched for. Again, this is very basic but would be rather difficult to do with a string base record system. If you wanted to add complexity, you could start talking about discover of multi-lingual items for bilingual users (since graph data should be language neutral). Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November
Re: [CODE4LIB] rdf triplestores
Thank you. On 12/19/14, 9:20 AM, Mixter,Jeff mixt...@oclc.org wrote: A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:00 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Hi All My question is what do you guys use triplestores for? Thanks Stuart == == Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using the graph we (the library) could suggest that they might like Book2 (since it is about the same subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that shared a common subject, Subject1, with the author, Author1, that was originally searched for. Again, this is very basic but would be rather difficult to do with a string base record system. If you wanted to add complexity, you could start talking about discover of multi-lingual items for bilingual users (since graph data should be language neutral). Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore
Re: [CODE4LIB] rdf triplestores
That's pretty much it. There are operations that are completely natural to a graph db that require either many joins or multiple queries to achieve with an RDBMS. It all depends very much on what sorts of data you're dealing with, and how you want to model that data. Graph databases can certainly be faster/more efficient at querying data with many joins than RDBMSs are. It's (unsurprisingly) easier to model data that looks like a web of relationships in a graph db than in an RDBMS. They're not so great at dealing with regular, record-shaped data on the other hand, nor document-shaped data for that matter. They are a very useful tool to have in your kit and using them will likely change the way you think about data modeling in a good way. On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com wrote: Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using the graph we (the library) could suggest that they might like Book2 (since it is about the same subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that shared a common subject, Subject1, with the author, Author1, that was originally searched for. Again, this is very basic but would be rather difficult to do with a string base record system. If you wanted to add complexity, you could start talking about discover of multi-lingual items for bilingual users (since graph data should be language neutral). Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure
Re: [CODE4LIB] rdf triplestores
Sarah, I should have probably chosen a different word, or at least explained it better. One of the advantages that SPARQL has over SQL is simplicity of the syntax. There are many simple SPARQL queries that in SQL would require multiple outer joins. This simplicity does not necessarily relate to efficiency but it is worth noting. The broader issue of comparing a relational database to a triple store has to do with limitations of RDBs that are not present when using triples. By nature RBS are rigidly defined with set tables and properties. In RDF there are no real restrictions. Assuming you understand the model for the data in the graph, you can consume more data without crosswalking or converting it and then query against the data based on the model that it uses. So with SQL, the structure of database defines the queries you can make. Conversely, with SPARQL the queries are defined by the data that is in the triplestore (i.e. the database is agnostic). The other major difference is that nodes in Graph data almost always (or at least should) have a persistent identifier. So 'Jane Austen', 'Austen, Jane', and '奥斯丁, 1775-1817' would all have the same identifier and consequently a query could be made using just that one identifier to find related entities. This is possible in a relational database but it would require a very well constructed table system and a extremely high level a maintenance and quality checking in order to keep it up to date. Most of the relational databases I have worked with either use strings for this type of search or if identifiers are present, they are generated based on the unique string in a certain field (such as Name). The later of these scenarios work well if you only have one language but as soon as you start dealing with multi-lingual data or dirty data ( i.e. Jan Austin instead of Jane Austen) you run into problems. I am not sure if this answered your question or not. Here are some resources on SPARQL queries that I have used in the past: http://www.cambridgesemantics.com/semantic-university/sparql-vs-sql-intro http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/Bizer-Schultz-Berlin-SPARQL-Benchmark-IJSWIS.pdf Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Sarah Weissman seweiss...@gmail.com Sent: Friday, December 19, 2014 2:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based
Re: [CODE4LIB] rdf triplestores
Stuart, This presentation was given at the Code4Lib conference in 2009. It is a good starting point. http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data I will dig around and try to find some other presentations or documents/articles that could be a used for introductory purposes. Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 2:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores This all sounds really interesting, can anyone recommend a resource for learning what it's all about and what it can be used for? Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Hugh Cayless Sent: Friday, December 19, 2014 2:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores That's pretty much it. There are operations that are completely natural to a graph db that require either many joins or multiple queries to achieve with an RDBMS. It all depends very much on what sorts of data you're dealing with, and how you want to model that data. Graph databases can certainly be faster/more efficient at querying data with many joins than RDBMSs are. It's (unsurprisingly) easier to model data that looks like a web of relationships in a graph db than in an RDBMS. They're not so great at dealing with regular, record-shaped data on the other hand, nor document-shaped data for that matter. They are a very useful tool to have in your kit and using them will likely change the way you think about data modeling in a good way. On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com wrote: Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using
Re: [CODE4LIB] rdf triplestores
Jeff Thanks I appreciate it. Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 2:45 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Stuart, This presentation was given at the Code4Lib conference in 2009. It is a good starting point. http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data I will dig around and try to find some other presentations or documents/articles that could be a used for introductory purposes. Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Forrest, Stuart sforr...@bcgov.net Sent: Friday, December 19, 2014 2:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores This all sounds really interesting, can anyone recommend a resource for learning what it's all about and what it can be used for? Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Hugh Cayless Sent: Friday, December 19, 2014 2:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores That's pretty much it. There are operations that are completely natural to a graph db that require either many joins or multiple queries to achieve with an RDBMS. It all depends very much on what sorts of data you're dealing with, and how you want to model that data. Graph databases can certainly be faster/more efficient at querying data with many joins than RDBMSs are. It's (unsurprisingly) easier to model data that looks like a web of relationships in a graph db than in an RDBMS. They're not so great at dealing with regular, record-shaped data on the other hand, nor document-shaped data for that matter. They are a very useful tool to have in your kit and using them will likely change the way you think about data modeling in a good way. On Fri, Dec 19, 2014 at 2:05 PM, Sarah Weissman seweiss...@gmail.com wrote: Jeff, With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. When you say that it would be easier to discover these other relations from the Jane Austen node, do you mean that you can query for relations in a triplestore/graph DB more readily (efficiently?) than you can in a RDB? It seems like the equivalent in the RDB model would be, given a piece of data used in a FK column in a table, to query for (if you even could) what other tables use the same FK, then query these tables, constraining to the Jane Austen value to see whether or not they had any data, which is not a natural way of using a RDB. -Sarah On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff mixt...@oclc.org wrote: Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). From this perspective I would point to Facebook or LinkedIn as prime examples of what can be done with graph data. Obviously those do not necessarily translate well into what can be done with library graph data but it does show the potential. For libraries, I think one of the benefits will be expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say Social Classes--Fiction). From there you could then find all of the Authors that wrote books
Re: [CODE4LIB] rdf triplestores
One thing I've been using a triple store for recently is to model a lexicographic dataset extracted from a bunch of TEI files. The TEI XML files are transcriptions of lexicons of various Australian aboriginal languages; tables of English language words, with their equivalents supplied by native speakers of those languages, in outback Australia in the early 20th C. For this aboriginal language project I wrote XSLT that converts one of these TEI files into an RDF/XML file in which the lexicographic data in the TEI is encoded in SKOS (a thesaurus vocabulary). I apply that stylesheet to each TEI file, and take the resulting RDF/XML file and store it in the RDF graph store with an HTTP PUT. Then I wrote SPARQL queries to query over the union of all those graphs, to extract statistics and analyze the full dataset. Using a triple store and a SPARQL query interface makes it much easier and more efficient to query the lexicographic data than it would be to query it directly from the TEI XML, using e.g. XQuery, For my triple store I chose to use Apache Fuseki, because it implements all the SPARQL 1.1 protocols including the Graph Store HTTP Protocol http://www.w3.org/TR/sparql11-http-rdf-update/. The crucial thing with the SPARQL 1.1 HTTP Graph Store protocol is that your unit of data management is not at the level of individual triples, but at the level of groups of triples - Named Graphs - which are very much the same as the concept of a record in traditional data management systems. So although it's possible to use the older SPARQL Update Protocol to manage your RDF data, I think it's generally much easier to use the SPARQL Graph Store HTTP Protocol interface to keep the RDF up to date and in synch with the source data. In the SPARQL Update Protocol, you send the SPARQL server a command that inserts and/or deletes triples; so it's a kind of Remote Procedure Call style of protocol. Whereas the Graph Store protocol is resource-oriented (RESTful); you simply identify a bunch of triples (a Named Graph), and use HTTP PUT to overwrite them with a new bunch of triples, or DELETE to remove them altogether, or POST to add new triples to the graph. On 20 December 2014 at 01:00, Forrest, Stuart sforr...@bcgov.net wrote: Hi All My question is what do you guys use triplestores for? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Stefano Bargioni Sent: Monday, November 11, 2013 8:53 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
I've been using Apache Fuseki ( http://jena.apache.org/documentation/serving_data/) for almost a year, in production since the spring. It's a SPARQL server with a built in TBD. It's easy to use, and takes about 5 minutes to get working on your desktop or server. Ethan On Mon, Nov 11, 2013 at 1:17 AM, Richard Wallis richard.wal...@dataliberate.com wrote: I've had some success with 4Store: http://4store.org Used it on mac laptop to load the WorldCat most highly held resources: http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/ As to the point about loading RDF/XML, especially if you have a large amount of data. - Triplestores much prefer raw triples for large amounts of data - Chopping up files of triples into smaller chunks is also often beneficial as it reduces memory footprints and can take advantage of multithreading. It is also far easier to recover from errors such as bad data etc. - A bit of unix command line wizardry (split followed a simple for-loop) is fairly standard practice Also raw triples are often easier to produce - none of that mucking about producing correctly formatted XML - and you can chop, sort, and play about with them using powerful unix command line tools. ~Richard. On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.org wrote: I've primarily used Sesame myself. The http based queries made it pretty easy to script against. http://www.openrdf.org/ On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan -- *Scott Turnbull* APTrust Technical Lead scott.turnb...@aptrust.org www.aptrust.org 678-379-9488 -- Richard Wallis Founder, Data Liberate http://dataliberate.com Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw
Re: [CODE4LIB] rdf triplestores
My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
I've used Fuseki a lot and really like it, although configuration for things like LARQ (full text indexing) historically has been a little underdocumented (and it can be a little difficult to understand what component is in charge of what task). 4-Store is super simple to get up and running with, as well, but I haven't used it in production for anything. -Ross. On Mon, Nov 11, 2013 at 8:52 AM, Stefano Bargioni bargi...@pusc.it wrote: My +1 for Joseki. sb On 11/nov/2013, at 06.12, Eric Lease Morgan wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
I'll second Richard on this. 4store is fairly quick to set up and get going. It comes with command-line tools and an HTTP option. FWIW, ID.LOC.GOV uses 4store in its stack. Yours, Kevin On 11/11/2013 01:17 AM, Richard Wallis wrote: I've had some success with 4Store: http://4store.org Used it on mac laptop to load the WorldCat most highly held resources: http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/ As to the point about loading RDF/XML, especially if you have a large amount of data. - Triplestores much prefer raw triples for large amounts of data - Chopping up files of triples into smaller chunks is also often beneficial as it reduces memory footprints and can take advantage of multithreading. It is also far easier to recover from errors such as bad data etc. - A bit of unix command line wizardry (split followed a simple for-loop) is fairly standard practice Also raw triples are often easier to produce - none of that mucking about producing correctly formatted XML - and you can chop, sort, and play about with them using powerful unix command line tools. ~Richard. On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.orgwrote: I've primarily used Sesame myself. The http based queries made it pretty easy to script against. http://www.openrdf.org/ On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan -- *Scott Turnbull* APTrust Technical Lead scott.turnb...@aptrust.org www.aptrust.org 678-379-9488
Re: [CODE4LIB] rdf triplestores
Eric, We just did a workshop at C4LMidwest on getting up and running with Fuseki and RDF/XML. Here's the 3-part tutorial (for OS X, but translates easily to Linux): http://jstirnaman.wordpress.com/2013/10/11/installing-fuseki-with-jena-and-tdb-on-os-x/ Jason -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Sunday, November 10, 2013 11:12 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] rdf triplestores What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan
Re: [CODE4LIB] rdf triplestores
We use 4Store at Oregon State University. I recommend it as very easy to put up. I've gone so far as to launch it live in a 20 minute talk. - Tom On Mon, Nov 11, 2013 at 8:52 AM, Kevin Ford k...@3windmills.com wrote: I'll second Richard on this. 4store is fairly quick to set up and get going. It comes with command-line tools and an HTTP option. FWIW, ID.LOC.GOV uses 4store in its stack. Yours, Kevin On 11/11/2013 01:17 AM, Richard Wallis wrote: I've had some success with 4Store: http://4store.org Used it on mac laptop to load the WorldCat most highly held resources: http://dataliberate.com/2012/08/putting-worldcat-data-into- a-triple-store/ As to the point about loading RDF/XML, especially if you have a large amount of data. - Triplestores much prefer raw triples for large amounts of data - Chopping up files of triples into smaller chunks is also often beneficial as it reduces memory footprints and can take advantage of multithreading. It is also far easier to recover from errors such as bad data etc. - A bit of unix command line wizardry (split followed a simple for-loop) is fairly standard practice Also raw triples are often easier to produce - none of that mucking about producing correctly formatted XML - and you can chop, sort, and play about with them using powerful unix command line tools. ~Richard. On 11 November 2013 18:19, Scott Turnbull scott.turnb...@aptrust.org wrote: I've primarily used Sesame myself. The http based queries made it pretty easy to script against. http://www.openrdf.org/ On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan -- *Scott Turnbull* APTrust Technical Lead scott.turnb...@aptrust.org www.aptrust.org 678-379-9488
Re: [CODE4LIB] rdf triplestores
I've primarily used Sesame myself. The http based queries made it pretty easy to script against. http://www.openrdf.org/ On Mon, Nov 11, 2013 at 12:12 AM, Eric Lease Morgan emor...@nd.edu wrote: What is your favorite RDF triplestore? I am able to convert numerous library-related metadata formats into RDF/XML. In a minimal way, I can then contribute to the Semantic Web by simply putting the resulting files on an HTTP file system. But if I were to import my RDF/XML into a triplestore, then I could do a lot more. Jena seems like a good option. So does Openlink Virtuoso. What experience do y'all have with these tools, and do you know how to import RDF/XML into them? -- Eric Lease Morgan -- *Scott Turnbull* APTrust Technical Lead scott.turnb...@aptrust.org www.aptrust.org 678-379-9488