Re: [CODE4LIB] linked data and open access
> And as has already been pointed out, no one has really show an impressive end > user use for linked data, which American decision making tends to be more > driven by. Well, that raises an important question -- whether an 'end user use', or other use, do people have examples of neat/important/useful things done with linked data in Europe, especially that would have been harder or less likely without the data being modelled/distributed as linked data? From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Brent Hanner [behan...@mediumaevum.com] Sent: Monday, December 22, 2014 6:11 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] linked data and open access There are deeper issues at work here than just the kind of obvious surface issues. One of the reason Europe embraced rdf triples and linked data was timing. The EU was forming its centralized information institutions the same time the idea of linked data to solve certain problem came about. So they took it and ran with it. In the US we have been primarily driven by the big data movement that gained steam shortly after. And as has already been pointed out, no one has really show an impressive end user use for linked data, which American decision making tends to be more driven by. Europeans can think about data and databases differently than we can here in the US. In Europe a database is intellectual property, in the US only parts of the database that fall under copyright law are intellectual property, which for most databases isn't much. You can’t copyright a fact. So in the US once you release the data into the wild its usually public domain. As for government data, the Federal and most state governments are in need of an overhaul that would make it possible. If you don’t have the systems or people in place who can make it happen it won’t happen. Heck the federal government can’t even get a single set of accounting software and what not. So it isn’t just a lack of leadership or will, there are other things at work as well. Brent Sent from Windows Mail From: Karen Coyle Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Yep, yep, and yep. Plus I'd add that the lack of centralization of library direction (read: states) is also a hindrance here. Having national leadership would be great. Being smaller also wouldn't hurt. kc On 12/19/14 6:48 AM, Eric Lease Morgan wrote: > I don’t know about y’all, but it seems to me that things like linked data and > open access are larger trends in Europe than here in the United States. Is > there are larger commitment to sharing in Europe when compared to the United > States? If so, is this a factor based on the nonexistence of a national > library in the United States? Is this your perception too? —Eric Morgan -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
[CODE4LIB] DC-2015 website and call for participation now open
*Apologies for cross-posting* *Metadata and Ubiquitous Access to Culture, Science and Digital Humanities* *DCMI 20th Anniversary International Conference & Annual Meeting* *September 1-5, 2015 — São Paulo, Brazil* The Conference website and the call for participation for DC-2015 are now open. *=* *Conference Website:* http://purl.org/dcevents/dc-2015 *Call for Participation:* http://purl.org/dcevents/dc-2015/cfp *Track Policies:* http://dcevents.dublincore.org/index.php/IntConf/dc-2015/schedConf/trackPolicies *=* *Abstract: *The need for structured metadata to support ubiquitous access across the Web to the treasure troves of resources spanning cultures, in science, and in the digital humanities is now common knowledge among information systems designers and implementers. Structured metadata expressed through languages of description make it possible for us to 'speak' about the contents of our treasure troves. But, like all human languages, our languages of description both enable and isolate. The push to break out of the isolation of the metadata silos in which professionals inevitably design, implement and manage metadata in order to discover the intersections of our treasure troves drives much of today's discourse and emerging practice in metadata. The emergence of massively integrated Web presences such as Europeana and the Digital Public Library of America (DPLA) along with the reshaping of public access globally through mechanisms such as Linked Data and schema.org drive our conversations, our excitement, and our fears. *IMPORTANT DATES:* *Technical Program Deadlines: * *Peer-Reviewed Papers, Project Reports & Posters* --*Submission Deadline:* 28 March 2015 --*Author Notification:* 23 June 2015 --*Final Copy:* 28 July 2015 *Professional Program Deadlines* *Special & Panel Sessions* --*Proposal Deadline:* 28 March 2015 --*Author Notification:* 25 April 2015 *Best Practice Posters & Demonstrations* --*Submission Deadline: *14 July 2015 --*Author Notification:* Ongoing *=* *Join us in São Paulo, Brazil* Each of the past 20 years, the metadata community has gathered for DCMI's conference and annual meeting. The work agenda of the DCMI community is broad and inclusive of all aspects of innovation in metadata design, implementation and best practices. While the work of the Initiative progresses throughout the year, the annual meeting and conference provide the opportunity for DCMI "citizens" as well as newcomers, students, apprentices, and early career professionals to gather face-to-face to share experiences and knowledge. In addition, the gathering provides public- and private-sector initiatives beyond DCMI that are doing significant metadata work to come together to compare notes and cast a broader light into their particular metadata work silos. Through such a gathering of the metadata communities, DCMI advances its "first goal" of promoting metadata interoperability and harmonization. This year, the annual meeting and conference are being hosted by the Universidade Estadual Paulista--São Paulo State University (UNESP) and held in São Paulo, Brazil. *=* *CONFERENCE ORGANIZERS:* --Universidade Estadual Paulista--São Paulo State University (UNESP) --Dublin Core Metadata Initiative (DCMI) *=* *CONFERENCE CHAIRS:* --Plácida Santos, Professor Universidade Estadual Paulista (UNESP), Brazil --Silvana Borsetti Gregorio Vidotti, Professor Universidade Estadual Paulista (UNESP), Brazil --Flávia Maria Bastos, CGB Coordinator General Coordination of Libraries Universidade Estadual Paulista (UNESP), Brazil --Mariana Curado Malta CEISE/ISCAP - Polytechnic of Oporto, Portugal Algoritmi Center - University of Minho, Portugal
Re: [CODE4LIB] rdf triplestores
Jeff (and Hugh): Thanks for the clarification. > The broader issue of comparing a relational database to a triple store has > to do with limitations of RDBs that are not present when using triples. By > nature RBS are rigidly defined with set tables and properties. In RDF there > are no real restrictions. Assuming you understand the model for the data in > the graph, you can consume more data without crosswalking or converting it > and then query against the data based on the model that it uses. > > I can see how the lack of a fixed relationship scheme would be useful. I am considering using a triple store for a project where we are annotating data and we might want to expand the types of annotations we make over time (e.g., semantic tags, machine learning results). And where the particular way we refer to the data that is being annotated might depend on the type of annotation (e.g. a data set ID, an image region). The other major difference is that nodes in Graph data almost always (or at > least should) have a persistent identifier. So 'Jane Austen', 'Austen, > Jane', and '奥斯丁, 1775-1817' would all have the same identifier and > consequently a query could be made using just that one identifier to find > related entities. This is possible in a relational database but it would > require a very well constructed table system and a extremely high level a > maintenance and quality checking in order to keep it up to date. I'm not sure the uniqueness of the persistent identifier is a big selling point (to me) of a triple store. It's possible to do what you are saying in a relational database, but it would be really bad design not to have a primary key for your author table. It seems like the strong selling point is to use the same set of persistent identifiers as someone else, so that you are speaking the same language. Otherwise your unique ID for Jane Austen is just as good as my unique ID in my relational authors table. One of my concerns, apart from making the business case for triple stores to an organization that is heavily invested in relational DB technology, is that when I've experimented with importing RDF data that has come out of a triple store into a relational DB, I have had issues with things like relations pointing to persistent identifiers that don't exist in the current namespace (perhaps this is a feature, not a bug?) and loops in relationship graphs that shouldn't have loops. Relational DBs aren't any good at finding loops, but you'd think a graph DB would be set up to detect that kind of thing. This makes me wonder if the technology is really all that mature. -Sarah > > > > From: Code for Libraries on behalf of Sarah > Weissman > Sent: Friday, December 19, 2014 2:05 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] rdf triplestores > > Jeff, > > > > With graph data it is much easier to search for an author (lets say Jane > > Austen) and find not only all of the books that she authored but also all > > of the books about her, all of the books that are about similar topics, > > published in similar periods. One can then imaging hopping from the Jane > > Austen node on the graph to a node that is a book she wrote (say Pride > and > > Prejudice) and then to a subject node for the book (say "Social > > Classes--Fiction). From there you could then find all of the Authors that > > wrote books about that same topic and then navigate to those books. > > > > > When you say that it would be "easier" to discover these other relations > from the Jane Austen node, do you mean that you can query for relations in > a triplestore/graph DB more readily (efficiently?) than you can in a RDB? > It seems like the equivalent in the RDB model would be, given a piece of > data used in a FK column in a table, to query for (if you even could) what > other tables use the same FK, then query these tables, constraining to the > Jane Austen value to see whether or not they had any data, which is not a > "natural" way of using a RDB. > > -Sarah > > > On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff wrote: > > > Stuart, > > > > Since triplestores, in essence, store graph data I think a slightly > better > > question is what can you do with graph data (if you do not mind me > > rephrasing you question). > > > > From this perspective I would point to Facebook or LinkedIn as prime > > examples of what can be done with graph data. Obviously those do not > > necessarily translate well into what can be done with library graph data > > but it does show the potential. For libraries, I think one of the > benefits > > will be expanded/enhanced discoverability for resources. > > > > With graph data it is much easier to search for an author (lets say Jane > > Austen) and find not only all of the books that she authored but also all > > of the books about her, all of the books that are about similar topics, > > published in similar periods. One can then imaging hopping from
Re: [CODE4LIB] linked data and open access
There are deeper issues at work here than just the kind of obvious surface issues. One of the reason Europe embraced rdf triples and linked data was timing. The EU was forming its centralized information institutions the same time the idea of linked data to solve certain problem came about. So they took it and ran with it. In the US we have been primarily driven by the big data movement that gained steam shortly after. And as has already been pointed out, no one has really show an impressive end user use for linked data, which American decision making tends to be more driven by. Europeans can think about data and databases differently than we can here in the US. In Europe a database is intellectual property, in the US only parts of the database that fall under copyright law are intellectual property, which for most databases isn't much. You can’t copyright a fact. So in the US once you release the data into the wild its usually public domain. As for government data, the Federal and most state governments are in need of an overhaul that would make it possible. If you don’t have the systems or people in place who can make it happen it won’t happen. Heck the federal government can’t even get a single set of accounting software and what not. So it isn’t just a lack of leadership or will, there are other things at work as well. Brent Sent from Windows Mail From: Karen Coyle Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Yep, yep, and yep. Plus I'd add that the lack of centralization of library direction (read: states) is also a hindrance here. Having national leadership would be great. Being smaller also wouldn't hurt. kc On 12/19/14 6:48 AM, Eric Lease Morgan wrote: > I don’t know about y’all, but it seems to me that things like linked data and > open access are larger trends in Europe than here in the United States. Is > there are larger commitment to sharing in Europe when compared to the United > States? If so, is this a factor based on the nonexistence of a national > library in the United States? Is this your perception too? —Eric Morgan -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
[CODE4LIB] Job: Metadata Specialist (Specialist III) at The New York Public Library
Metadata Specialist (Specialist III) The New York Public Library New York, New York _** Overview:**_ The Metadata Services Unit (MSU) of NYPL Labs is seeking its newest member: a creative, self-motivated specialist to wrangle metadata, oversee workflows, make batch enhancements across a corpus of more than a million metadata records, contributing to the conceptualization and rollout of experimental data collection and remediation tools, and lots more. This is a perfect opportunity for an enthusiastic, problem-solving individual interested in the full digital library lifecycle, from digitization to the creation of user- engagement tools and public programs. Passionate interest in the future of libraries a must. The Metadata Services Unit supports the discovery, use, and innovative reuse of NYPL's unique digital resources on the Web, particularly via its new Digital Collections platform (http://digitalcollections.nypl.org/) and other tools. MSU defines local standards, monitors metadata quality, and provides training and support to metadata creators across NYPL. The Metadata Services Unit is part of the New York Public Library Labs (NYPL Labs). Based dually at the Library's landmark central branch on 42nd Street, and at its cutting-edge services center in Long Island City, NYPL Labs is an interdisciplinary team working to reformat and reposition the Library's knowledge for the Internet age. Labs combines core digital library capacities (digitization, metadata, permissions/reproductions etc.) with an award-winning tech/design and outreach team focused on deepening engagement with digital collections and data, and fostering new forms of research and creativity. _**Responsibilities:**_ Reporting to the Manager, Metadata Services Unit, this position: Creates, updates, and enhances metadata for the Library's digital collections Coordinates metadata creation workflows for digitization projects in close collaboration with curators, Digital Imaging Unit staff, Rights staff, and metadata creators. Trains staff across NYPL on tools, policies, and procedures to ensure NYPL metadata complies with local and international standards and practices. Scripts batch processes for uploading and updating metadata in NYPL's Metadata Management System. Manages workflows for ingesting locally scanned materials into HathiTrust and updating local ILS records with HathiTrust links. Contributes to local metadata policies, procedures, and standards as well as technical requirements for NYPL's Metadata Management System, digital repository, and other tools. Performs other duties as required _**Key Competencies:**_ The ability to establish priorities, follow project timelines, and meet deadlines while working independently and with minimal supervision Accuracy and attention to detail The ability to identify opportunities for, and implement solutions to achieve, greater efficiency in a production environment _**Qualifications:**_ Masters Degree required, ALA accredited MLIS preferred, Archival Studies Certificate a plus. Strong interpersonal, oral, and written communication skills. Demonstrated ability to work well collaboratively and independently on complex projects involving diverse participants outside the direct work unit, and to meet project deadlines. Demonstrated organizational, analytical, and problem-solving skills, with attention to detail and a high level of accuracy. Experience interpreting and applying descriptive content standards (such as RDA, DACS, CCO, etc.) in a non-MARC metadata environment. Demonstrated knowledge of data and database structures, metadata standards, and encoding schema, including MARC21, Dublin Core, MODS, METS and EAD. Experience using scripting and querying languages such as Python, SQL, Javascript, Bash, etc. to extract, analyze, or manipulate metadata. Familiarity with Linked Data concepts and technologies. _**Starting Salary:**_ USD $55,615.00/Yr. _**Union / Non Union:**_ Local 1930 *TO APPLY, PLEASE VISIT THE FOLLOWING LINK***: ** https://jobs-nypl.icims.com/jobs/8199/metadata-specialist-%28specialist-iii%29 /job?mode=view&mobile=false&width=750&height=500&bga=true&needsRedirect=false Brought to you by code4lib jobs: http://jobs.code4lib.org/job/18700/ To post a new job please visit http://jobs.code4lib.org/
Re: [CODE4LIB] rdf triplestores
Jeff (and Hugh): Thanks for the clarification. > The broader issue of comparing a relational database to a triple store has > to do with limitations of RDBs that are not present when using triples. By > nature RBS are rigidly defined with set tables and properties. In RDF there > are no real restrictions. Assuming you understand the model for the data in > the graph, you can consume more data without crosswalking or converting it > and then query against the data based on the model that it uses. > > I can see how the lack of a fixed relationship scheme would be useful. I am considering using a triple store for a project where we are annotating data and we might want to expand the types of annotations we make over time (e.g., semantic tags, machine learning results). And where the particular way we refer to the data that is being annotated might depend on the type of annotation (e.g. a data set ID, an image region). The other major difference is that nodes in Graph data almost always (or at > least should) have a persistent identifier. So 'Jane Austen', 'Austen, > Jane', and '奥斯丁, 1775-1817' would all have the same identifier and > consequently a query could be made using just that one identifier to find > related entities. This is possible in a relational database but it would > require a very well constructed table system and a extremely high level a > maintenance and quality checking in order to keep it up to date. I'm not sure the uniqueness of the persistent identifier is a big selling point (to me) of a triple store. It's possible to do what you are saying in a relational database, but it would be really bad design not to have a primary key for your author table. It seems like the strong selling point is to use the same set of persistent identifiers as someone else, so that you are speaking the same language. Otherwise your unique ID for Jane Austen is just as good as my unique ID in my relational authors table. One of my concerns, apart from making the business case for triple stores to an organization that is heavily invested in relational DB technology, is that when I've experimented with importing RDF data that has come out of a triple store into a relational DB, I have had issues with things like relations pointing to persistent identifiers that don't exist in the current namespace (perhaps this is a feature, not a bug?) and loops in relationship graphs that shouldn't have loops. Relational DBs aren't any good at finding loops, but you'd think a graph DB would be set up to detect that kind of thing. This makes me wonder if the technology is really all that mature. -Sarah > > > > From: Code for Libraries on behalf of Sarah > Weissman > Sent: Friday, December 19, 2014 2:05 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] rdf triplestores > > Jeff, > > > > With graph data it is much easier to search for an author (lets say Jane > > Austen) and find not only all of the books that she authored but also all > > of the books about her, all of the books that are about similar topics, > > published in similar periods. One can then imaging hopping from the Jane > > Austen node on the graph to a node that is a book she wrote (say Pride > and > > Prejudice) and then to a subject node for the book (say "Social > > Classes--Fiction). From there you could then find all of the Authors that > > wrote books about that same topic and then navigate to those books. > > > > > When you say that it would be "easier" to discover these other relations > from the Jane Austen node, do you mean that you can query for relations in > a triplestore/graph DB more readily (efficiently?) than you can in a RDB? > It seems like the equivalent in the RDB model would be, given a piece of > data used in a FK column in a table, to query for (if you even could) what > other tables use the same FK, then query these tables, constraining to the > Jane Austen value to see whether or not they had any data, which is not a > "natural" way of using a RDB. > > -Sarah > > > On Fri, Dec 19, 2014 at 11:10 AM, Mixter,Jeff wrote: > > > Stuart, > > > > Since triplestores, in essence, store graph data I think a slightly > better > > question is what can you do with graph data (if you do not mind me > > rephrasing you question). > > > > From this perspective I would point to Facebook or LinkedIn as prime > > examples of what can be done with graph data. Obviously those do not > > necessarily translate well into what can be done with library graph data > > but it does show the potential. For libraries, I think one of the > benefits > > will be expanded/enhanced discoverability for resources. > > > > With graph data it is much easier to search for an author (lets say Jane > > Austen) and find not only all of the books that she authored but also all > > of the books about her, all of the books that are about similar topics, > > published in similar periods. One can then imaging hopping from
Re: [CODE4LIB] rdf triplestores
Hi Jeff So then are triple stores a means to an end that is just a vehicle for storing a type of data ie graph data? Like Access stores relational data? On the path to learning this, what software would I install for experimenting? Thanks Stuart Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 11:10 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Stuart, Since triplestores, in essence, store graph data I think a slightly better question is what can you do with graph data (if you do not mind me rephrasing you question). >From this perspective I would point to Facebook or LinkedIn as prime examples >of what can be done with graph data. Obviously those do not necessarily >translate well into what can be done with library graph data but it does show >the potential. For libraries, I think one of the benefits will be >expanded/enhanced discoverability for resources. With graph data it is much easier to search for an author (lets say Jane Austen) and find not only all of the books that she authored but also all of the books about her, all of the books that are about similar topics, published in similar periods. One can then imaging hopping from the Jane Austen node on the graph to a node that is a book she wrote (say Pride and Prejudice) and then to a subject node for the book (say "Social Classes--Fiction). From there you could then find all of the Authors that wrote books about that same topic and then navigate to those books. Our current ILS systems try t o do this with MARC records but because they are mostly string based, it is very difficult to accurately provide this type of information to users. Graph data helps overcome this hurdle. This was a rather basic example of how end-users can benefit from graph data but I think it is a compelling reason. I have attached a simple image to help visualize what I was talking about. In it the user would start by finding Author1 and then using the graph we (the library) could suggest that they might like Book2 (since it is about the same subject) or even Book3 (since it is by Author2 who wrote a book, Book2, that shared a common subject, Subject1, with the author, Author1, that was originally searched for. Again, this is very basic but would be rather difficult to do with a string base record system. If you wanted to add complexity, you could start talking about discover of multi-lingual items for bilingual users (since graph data should be language neutral). Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org From: Code for Libraries on behalf of Forrest, Stuart Sent: Friday, December 19, 2014 10:32 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores Thanks Jeff Interesting concept, can you give me any examples of their usage, what kinds of data etc.? Thanks Stuart Forrest PhD Library Systems Specialist Beaufort County Library 843 255 6450 sforr...@bcgov.net http://www.beaufortcountylibrary.org For Leisure, For Learning, For Life -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mixter,Jeff Sent: Friday, December 19, 2014 10:20 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] rdf triplestores A triplestore is basically a database backend for RDF triples. The major benefit is that it allows for SPARQL querying. You could imagine a triplestore as being the same thing as a relational database that can be queried with SQL. The drawback that I have run into is that unless you have unlimited hardware, triplestores can run into scaling problems (when you are looking at hundreds of millions or billions of triples). This is a problem when you want to search for data. For searching I use a hybrid Elasticsearch (i.e. Lucene) index for the string literals and the go out to the triplestore to query for the data. If you are looking to use a triplestore it is important to distinguish between search and query. Triplestore are really good for query but not so good for search. The basic problem with search is that is it mostly string based and this requires a regular expression query in SPARQL which is expensive from a hardware perspective. There are a few triple stores that use a hybrid model. In particular Jena Fuseki (http://jena.apache.org/documentation/query/text-query.html) Thanks, Jeff Mixter Research Support Specialist OCLC Research 614-761-5159 mixt...@oclc.org _