Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies
On Sat, Jan 25, 2014 at 6:20 AM, Jon Phipps jphi...@madcreek.com wrote: On Fri, Jan 24, 2014 at 11:16 AM, Robert Sanderson azarot...@gmail.com wrote: All in my opinion, and all debatable. I hope that your choice goes well for you, I'd like to repeat: just because I agree with that choice, and I'm defending it here, it wasn't my choice. Not at all. And the concerns you express were well-aired and very carefully considered before the choice was made. And yours :) Ok, that makes me feel a bit personally defensive... Apologies! It was too much shorthand. I meant that your concerns were well-aired, and well explained, in this thread :) Rob
Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies
On Fri, Jan 24, 2014 at 7:56 AM, Jon Phipps jphi...@madcreek.com wrote: Hi Rob, the conversation continues below... On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson azarot...@gmail.com wrote: Hi Jon, To present the other side of the argument so that others on the list can make an informed decision... Thanks for reminding me that this is an academic panel discussion in front of an audience, rather than a conversation. On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps jphi...@madcreek.com wrote: I've developed a quite strong opinion that vocabulary developers should not _ever_ think that they can understand the semantics of a vocabulary resource by 'reading' the URI. 100% Agreed. Good documentation is essential for any ontology, and it has to be read to understand the semantics. You cannot just look at oa:hasTarget, out of context, and have any idea what it refers to. However if that URI is readable it makes developers lives much easier in a lot of situations, and it has no additional cost. Opaque URIs for predicates is the digital equivalent of thumbing your nose at the people you should be courting -- the people who will actually use your ontology in any practical sense. It says: We don't care about you enough to make your life one step easier by having something that's memorable. You will always have to go back to the ontology every time and reread this documentation, over and over and over again. What you suggest is that an identifier (e.g. @azaroth42 or ORCID: -0003-4441-6852 https://orcid.org/-0003-4441-6852) should always be readable as a convenience to the developer. RDA does provide a 'readable in the language of the reader' uri specifically as a convenience to the developer. A feature that I lobbied for. It's just not the /canonical/ URI, because it's an identifier of a property, not the property itself, and that property is independent of the language used to label it. It's the difference between Metadata Management Associates, PO Box 282, Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly functional complete address in the USA namespace), which is precisely the same identifier of that box for machines, and ultimately for the postmaster, who doesn't care whose name is on the box numbered 282, who only needs to know that highly memorable name when someone uses the convenience of not bothering to look up the box number and just sends mail addressed to us at 14854, or even just Jacksonville. And no I don't want to start a URL vs. URI/URN/IRI discussion. Do you have some expectation that in order for the data to be useful your relational or object database identifiers must be readable? Identifiers for objects, no. The table names and field names? Yes. How many DBAs do you know that create tables with opaque identifiers for the column names? How many XML schemas do you know that use opaque identifiers for the element names? My count is 0 from many many many instances. And the reason is the same as having readable predicate URIs -- so that when you look at the table, schema, ontology, triple or what have you, there is some mnemonic value from the name to its intent. Our experience obviously differs in this regard. I've seen many, many databases that have relatively opaque column identifiers that were relabeled in the query to suit the audience for the query. I've seen many French databases, with French content, intended for a French audience, designed by French developers, that had French 'column headers'. The point here is that the identifiers /identify/ a property that exists independent of the language of the data being used to describe a resource. If RDA _had_ to pick a single language to satisfy your requirement for a single readable identifier, which one? To assume that the one language should be English says to the non-english speaking world We don't care about you enough to make your life one step easier by having something that's memorable By whom, and in English? This to me is a frankly colonial assumption of the dominance of English in the world of metadata. In the world of computing in general. for if while ... all English. While there are turing complete languages out there, the ones that don't have real world language constructions are toys, like Whitespace for example. Even the lolcats programming language is more usable than whitespace. Again, it's a cost/value consideration. There are many people who will understand English, and when developers program, they're surrounded by it. If your intended audience is primarily people who speak French, then you would be entirely justified in using URIs with labels from French. Or Chinese, though the IRI expansion would be more of a pain :) Despite the fact that developers are surrounded by English I've worked with many highly skilled
Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies
(Sorry for a previous empty message) Hi Jon, On Fri, Jan 24, 2014 at 7:56 AM, Jon Phipps jphi...@madcreek.com wrote: Hi Rob, the conversation continues below... On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson azarot...@gmail.com wrote: To present the other side of the argument so that others on the list can make an informed decision... Thanks for reminding me that this is an academic panel discussion in front of an audience, rather than a conversation. Heh :) I just meant that I wasn't trying to convince you to change, just that I wanted to voice my concerns. (But, yes, touché!) On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps jphi...@madcreek.com wrote: However if that URI is readable it makes developers lives much easier in a lot of situations, and it has no additional cost. Opaque URIs for predicates is the digital equivalent of thumbing your nose at the people you should be courting What you suggest is that an identifier (e.g. @azaroth42 or ORCID: -0003-4441-6852 https://orcid.org/-0003-4441-6852) should always be readable as a convenience to the developer. Those are identifiers for objects or entities, not predicates. As I said, I'm happy for entities to have opaque URIs. Where we disagree is that you can carry over that same rationale to predicates/properties/relationships. RDA does provide a 'readable in the language of the reader' uri specifically as a convenience to the developer. A feature that I lobbied for. It's just not the /canonical/ URI, because it's an identifier of a property, not the property itself, and that property is independent of the language used to label it. So this, IMO, is where the trouble starts. People /will/ use those convenience URIs. And that will make for a nightmare in terms of interoperability (see below). It's the difference between Metadata Management Associates, PO Box 282, Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly functional complete address in the USA namespace), which is precisely the same identifier of that box for machines Which is also an entity, not a predicate. I almost said property there, which would be amusingly incorrect. Do you have some expectation that in order for the data to be useful your relational or object database identifiers must be readable? Identifiers for objects, no. The table names and field names? Yes. How many DBAs do you know that create tables with opaque identifiers for the column names? How many XML schemas do you know that use opaque identifiers for the element names? My count is 0 from many many many instances. And the reason is the same as having readable predicate URIs -- so that when you look at the table, schema, ontology, triple or what have you, there is some mnemonic value from the name to its intent. Our experience obviously differs in this regard. I've seen many, many databases that have relatively opaque column identifiers that were relabeled in the query to suit the audience for the query. I've seen many French databases, with French content, intended for a French audience, designed by French developers, that had French 'column headers'. Yes, but French column headers are not opaque. How many schemas have completely opaque, non-linguistic column headers, element names, etc? I'm not talking relatively opaque, I mean P12345 or similar. I didn't count MARC in my 0, which is strictly true as it's not XML or a relational table, but you could say 1 to be fair. Yes, sometimes they're PrpCtr or similar, but that's at least somewhat readable (Property Counter, perhaps?) compared to a UUID or random integer. The point here is that the identifiers /identify/ a property that exists independent of the language of the data being used to describe a resource. If RDA _had_ to pick a single language to satisfy your requirement for a single readable identifier, which one? To assume that the one language should be English says to the non-english speaking world We don't care about you enough to make your life one step easier by having something that's memorable My problem is not with the idea that properties exist independently of language, it's the side effect of not picking a language to use. If you had to pick one, then you should pick one. If you want to make a political stand, don't pick English. But at least pick one, and only one. Not caring about the non-English speaking world is at least caring about some people, rather than no one. Or the non-French speaking world. Despite the fact that developers are surrounded by English I've worked with many highly skilled developers who didn't speak or read English. Who relied on documentation and meetings in their own language. Likewise, though admittedly primarily European languages rather than Asian. However even if someone doesn't speak English (or Italian, or French, or German), a language-based construct is more memorable than
Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies
Hi Jon, To present the other side of the argument so that others on the list can make an informed decision... On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps jphi...@madcreek.com wrote: I've developed a quite strong opinion that vocabulary developers should not _ever_ think that they can understand the semantics of a vocabulary resource by 'reading' the URI. 100% Agreed. Good documentation is essential for any ontology, and it has to be read to understand the semantics. You cannot just look at oa:hasTarget, out of context, and have any idea what it refers to. However if that URI is readable it makes developers lives much easier in a lot of situations, and it has no additional cost. Opaque URIs for predicates is the digital equivalent of thumbing your nose at the people you should be courting -- the people who will actually use your ontology in any practical sense. It says: We don't care about you enough to make your life one step easier by having something that's memorable. You will always have to go back to the ontology every time and reread this documentation, over and over and over again. Do you have some expectation that in order for the data to be useful your relational or object database identifiers must be readable? Identifiers for objects, no. The table names and field names? Yes. How many DBAs do you know that create tables with opaque identifiers for the column names? How many XML schemas do you know that use opaque identifiers for the element names? My count is 0 from many many many instances. And the reason is the same as having readable predicate URIs -- so that when you look at the table, schema, ontology, triple or what have you, there is some mnemonic value from the name to its intent. By whom, and in English? This to me is a frankly colonial assumption of the dominance of English in the world of metadata. In the world of computing in general. for if while ... all English. While there are turing complete languages out there, the ones that don't have real world language constructions are toys, like Whitespace for example. Even the lolcats programming language is more usable than whitespace. Again, it's a cost/value consideration. There are many people who will understand English, and when developers program, they're surrounded by it. If your intended audience is primarily people who speak French, then you would be entirely justified in using URIs with labels from French. Or Chinese, though the IRI expansion would be more of a pain :) The proper understanding of the semantics, although still relatively minimal, is from the definition, not the URI. Yes. Any short cuts to *understanding* rather than *remembering* are to be avoided. Our coining and inclusion of multilingual (eventually) lexical URIs based on the label is a concession to developers who feel that they can't effectively 'use' the vocabularies unless they can read the URIs. So in my opinion, as is everything in the mail of course, this is even worse. Now instead of 1600 properties, you have 1600 * (number of languages +1) properties. And you're going to see them appearing in uses of the ontology. Either stick with your opaque identifiers or pick a language for the readable ones, and best practice would be English, but doing both is a disaster in the making. I grant that writing ad hoc sparql queries with opaque URIs can be intensely frustrating, but the vocabularies aren't designed specifically to support that incredibly narrow use case. Writing queries is something developers have to do to work with data. More importantly, writing code that builds the triples in the first place is something that developers have to do. And they have to get it right ... which they likely won't do first time. There will be typos. That P1523235 might be written into the code as P1533235 ... an impossible to spot typo. dc:title vs dc:titel ... a bit easier to spot, no? So the consequence is that the quality of the uses of your ontology will go down. If there were 16 fields, maybe there'd be a chance of getting it right. But 1600, with 5 digit identifiers, is asking for trouble. Compare MARC fields. We all love our 245$a, I know, but dc:title is a lot easier to recall. Now imagine those fields are (seemingly) random 5 digit codes without significant structure. And that there's 1600 of them. And you're asking the developer to use a graph structure that's likely unfamiliar to them. All in my opinion, and all debatable. I hope that your choice goes well for you, but would like other people to think about it carefully before following suit. Rob
Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies
P166123464771 And now no one understands at all. CIDOC-CRM has taken the same approach -- it's better that everyone is equal in their non-comprehension than people who speak a particular language are somehow advantaged. BTW, as an English speaker, I also don't understand other designation associated with the corporate body, regardless of spaces or camelCase. Labels and semantic descriptions are *always* important. The we might change what this means argument is also problematic -- if you change what it means, then you should change the URI! Otherwise people will continue to use them incorrectly, plus the legacy data generated with the previous definition will suddenly change what it's saying. Finally, 1600 properties... good luck with that. Rob On Wed, Jan 22, 2014 at 3:03 PM, Hamilton, Gill g.hamil...@nls.uk wrote: Je ne comprends pas l'anglais. Je ne comprends pas l'URI otherDesignationAssociatedWithTheCorporateBody 私は日本人です。私は理解していない、そのURI Opaque URIs with human readable labels helps in an international context. Just my two yens worth :) G - Gill Hamilton Digital Access Manager National Library of Scotland George IV Bridge Edinburgh EH1 1EW, Scotland e: g.hamil...@nls.uk t: +44 (0)131 623 3770 Skype: gill.hamilton.nls From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Dan Scott [deni...@gmail.com] Sent: 22 January 2014 21:10 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Fwd: [rules] Publication of the RDA Element Vocabularies Hi Karen: On Wed, Jan 22, 2014 at 3:16 PM, Karen Coyle li...@kcoyle.net wrote: I can't address the first points, but I can speak a bit to the question of meaningful URIs. In the original creation of the RDA elements, meaningful URIs were used based on the actual RDA terminology. This resulted in URIs like: http://rdvocab.info/Elements/alternativeChronologicalDesignationOfLastIssueOrPartOfSequence and... http://rdvocab.info/Elements/alternativeChronologicalDesignationOfLastIssueOrPartOfSequenceManifestation Not only that, the terminology for some elements changed over time, which in some cases meant deprecating a property that was then overly confusing based on its name. Now, I agree that one possibility would have been for the JSC to develop meaningful but reasonably short property names. Another possibility is that we cease looking at URIs and begin to work with labels, since URIs are for machines and labels are for humans. Unfortunately, much RDF software still expects you to work with the underlying URI rather than the human-facing label. We need to get through that stage as quickly as possible, because it's causing us to put effort into URI naming that would be best used for other analysis activities. Thanks for responding on this front. I understand that, while the vocabulary was in heavy active development it might have been painful to adjust as elements changed, but given that this marks the actual publication of the vocabulary, that churn should have settled down, and then this part of the JSC's contribution to semantic web could have semantics applied at both the micro and macro level. I guess I see URIs as roughly parallel to API names; as long as humans are assembling programs, we're likely to benefit from having meaningful (no air quotes required) names... even if sometimes the meaning drifts over time and the code APIs need to be refactored. Dealing with sequentially numbered alphanumeric identifiers reminds me rather painfully of MARC. For what it's worth (and it might not be worth much) curl http://rdaregistry.info/Elements/a/P50101 | grep reg:name | sort | uniq -c shows that the reg:name property is unique across all of the agent properties, at least. Remnants of the earlier naming effort? If that pattern holds, those could have been simply used for the identifiers in place of P#. The most unwieldy of those appears to be otherDesignationAssociatedWithTheCorporateBody (which _is_ unwieldy, certainly, but still more meaningful than http://rdaregistry.info/Elements/a/P50033). Perhaps it's not too late? Follow us on Twitter and Facebook National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. www.nls.uk
Re: [CODE4LIB] rdf ontologies for archival descriptions
Have you considered the LOCAH work in mapping EAD into Linked Data? http://archiveshub.ac.uk/locah/ and http://data.archiveshub.ac.uk/ Rob On Sun, Jan 19, 2014 at 5:10 PM, Ben Companjen ben.compan...@dans.knaw.nlwrote: Hi Eric, While I'm no archivist by training (information systems engineer I am), I've learned a thing or two from having to work with EAD and its basis for use, ISAD(G) (all citations below are from ISAD(G), 2nd edition). As with all information modelling, either inside or outside the Linked Data domain, you should take a step back to look at the goal of the description. When you have a list of what you want to describe, you can start looking for ontologies. You probably know this, but I was triggered by Because many archival descriptions are rooted in MARC records, and MODS is easily mapped from MARC. to respond. IMO archival descriptions are rooted in rules for description, not a specific file format. So, when I of (some of) the essences of archival description, I think of: - The purpose of archival description is to identify and explain the context and content of archival material in order to promote its accessibility. This is achieved by creating accurate and appropriate representations and by organizing them in accordance with predetermined models. (§I.2) - … seven areas of descriptive information: 1. Identity Statement Area (where essential information is conveyed to identify the unit of description) 2. Context Area (where information is conveyed about the origin and custody of the unit of description) 3. Content and Structure Area (where information is conveyed about the subject matter and arrangement of the unit of description) 4. Condition of Access and Use Area (where information is conveyed about the availability of the unit of description) 5. Allied Materials Area (where information is conveyed about materials having an important relationship to the unit of description) 6. Note Area (where specialized information and information that cannot be accommodated in any of the other areas may be conveyed). 7. Description Control Area (where information is conveyed on how, when and by whom the archival description was prepared). (§I.11) There is a distinction between the thing being described, and the description itself, and both have an important role within the archival description. (If anything so far causes confusion with anyone here, I misunderstood and accept to be corrected :)) NB: this is one way of thinking of descriptions. Incorporating the PROV-ontology would make sense for expressing more/other aspects of the provenance of archival entities, but I haven't got round to becoming an expert of PROV yet ;) ISAD(G) lists 26 elements that may be combined to constitute the description of an archival entity. Trying to translate these 'elements', I'd end up with possible a lot more than 26 RDFS/OWL properties. *Depending on the type of archival entity you can/should of course use more specific ontologies.* Let me list some properties and related ontologies. # Identity statement area ## Identifiers The URI, naturally, and other IDs. Could be linked using dc(terms):identifier, or mods:identifier, or other ontologies. Ideally there is some way of linking the domain of the ID to the ID itself, because box 101 is likely not unique in the universe. Perhaps you want to publish a URI strategy separately to explain how the URI was assembled/derived. ## Title Again DC(terms), MODS, RDA ## Date(s) You want properties that have a clear meaning. For example, dcterms:created and mods:dateCreated assume it is clear what when the resource was created means. DC terms are vague, I mean general, on purpose. You could create some properties `owl:subPropertyOf` dcterms date properties for this. I'd look into EDTF for encoding uncertain dates and ranges and BCE dates (MODS doesn't support BCE dates). ## Level of description What kind of 'documentary unit' does the description describe? A whole building's content or one piece of paper? I don't know of any ontology with terms fonds, …, file, item, but you could say `http URI rdf:type fonds class URI`. ## Extent and medium Saying anything about extent and medium should possible only happen on the lowest level of description. Any higher level extent and medium should be calculated by aggregating lower level descriptions. On the lowest level, refer to class URIs. A combination of dimensions and material {c|sh}ould be a class, e.g. A4 paper 80 grams/square meter. # Context area ## Creator(s) and administrative/biographical history As ISAD(G) refers to ISAAR(CPF) for description of corporate bodies, people, and families, this is a perfect example of using existing people- and organisation-describing ontologies like FOAF, BIO, ORG, and others are useful for separate descriptions of the
Re: [CODE4LIB] archiving web pages
For what it's worth, the latest wayback code is: https://github.com/iipc/openwayback And being developed by the IIPC consortium, rather than just the Internet Archive alone. It has many additional features, contributed by other members. It should be used in preference to the sourceforge version, IMO. Rob On Tue, Jan 14, 2014 at 10:00 AM, L Snider lsni...@gmail.com wrote: Hi Kathryn, Right now the WARC format is considered the best preservation format for websites/social media, in terms of digital archives. It is our best guess right now. It will likely will be with us for a long time, because it has been adopted by most of the major players. The way I have seen WARCs served up is through Wayback, the manual version of the Internet Archive's Wayback machine. http://archive-access.sourceforge.net/projects/wayback/index.html I have only used Heritrix and Wayback together, so I haven't played with Wayback and WARCs made another way. I would stick with WARC in terms of preservation, access is another story...that would depend on budget, time, etc. Hope that helps. Cheers Lisa -- Lisa Snider Electronic Records Archivist Harry Ransom Center The University of Texas at Austin P.O. Box 7219 Austin, Texas 78713-7219 P: 512-232-4616 www.hrc.utexas.edu On Tue, Jan 14, 2014 at 10:48 AM, Kathryn Frederick (Library) kfred...@skidmore.edu wrote: Hi, I'm trying to develop a strategy for preserving issues our school's online newspaper. Creating a WARC file of the content seems straightforward, but how will that content fair long-term? Also, how is the WARC served to an end-user? Is there some other method I should look at? Thanks in advance for any advice! Kathryn
Re: [CODE4LIB] archiving web pages
Here are several to consider: * http://www.webarchive.org.uk/wayback/archive/*/http://www.aboutmayfair.co.uk/ * http://webarchive.loc.gov/lcwa0015/*/http://lawprofessors.typepad.com/adminlaw/ * http://www.padi.cat:8080/wayback/*/http://www.ajberga.cat/ * http://vefsafn.is/index.php?page=english Hope that helps :) Rob On Tue, Jan 14, 2014 at 10:31 AM, Nathan Tallman ntall...@gmail.com wrote: Lisa, Is your local web archive available online? I'd like to see a production example of non-Internet Archive instance of Wayback/Open Wayback. Thanks, Nathan On Tue, Jan 14, 2014 at 12:17 PM, L Snider lsni...@gmail.com wrote: Rob is right on! I included the wrong link, thanks for catching that... Cheers Lisa On Tue, Jan 14, 2014 at 11:04 AM, Robert Sanderson azarot...@gmail.com wrote: For what it's worth, the latest wayback code is: https://github.com/iipc/openwayback And being developed by the IIPC consortium, rather than just the Internet Archive alone. It has many additional features, contributed by other members. It should be used in preference to the sourceforge version, IMO. Rob On Tue, Jan 14, 2014 at 10:00 AM, L Snider lsni...@gmail.com wrote: Hi Kathryn, Right now the WARC format is considered the best preservation format for websites/social media, in terms of digital archives. It is our best guess right now. It will likely will be with us for a long time, because it has been adopted by most of the major players. The way I have seen WARCs served up is through Wayback, the manual version of the Internet Archive's Wayback machine. http://archive-access.sourceforge.net/projects/wayback/index.html I have only used Heritrix and Wayback together, so I haven't played with Wayback and WARCs made another way. I would stick with WARC in terms of preservation, access is another story...that would depend on budget, time, etc. Hope that helps. Cheers Lisa -- Lisa Snider Electronic Records Archivist Harry Ransom Center The University of Texas at Austin P.O. Box 7219 Austin, Texas 78713-7219 P: 512-232-4616 www.hrc.utexas.edu On Tue, Jan 14, 2014 at 10:48 AM, Kathryn Frederick (Library) kfred...@skidmore.edu wrote: Hi, I'm trying to develop a strategy for preserving issues our school's online newspaper. Creating a WARC file of the content seems straightforward, but how will that content fair long-term? Also, how is the WARC served to an end-user? Is there some other method I should look at? Thanks in advance for any advice! Kathryn
Re: [CODE4LIB] The lie of the API
Hi Richard, On Sun, Dec 1, 2013 at 4:25 PM, Richard Wallis richard.wal...@dataliberate.com wrote: It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules Don't wish your implementation problems on the consumers of your data. There are [you would hope] far more of them than of you ;-) Content-negotiation is an already established mechanism - why invent a new, and different, one just for *your* data? I should have been clearer here that I was responding to the original blog post. I'm not advocating arbitrary APIs, but instead just to use link headers between the different representations. The advantages are that the caching issues (both browser and intermediate caches) go away as the content is static, you don't need to invent a way to find out which formats are available (eg no arbitrary content in a 300 response), and you can simply publish the representations as any other resource without server side logic to deal with conneg. The disadvantages are ... none. There's no invention of APIs, it's just following a simpler route within the HTTP spec. Put your self in the place of your consumer having to get their head around yet another site specific API pattern. As a consumer of my own data, I would rather do a simple GET on a URI than mess around constructing the correct Accept header. As to discovering then using the (currently implemented) URI returned from a content-negotiated call - The standard http libraries take care of that, like any other http redirects (301,303, etc) plus you are protected from any future backend server implementation changes. No they don't, as there's no way to know which representations are available via conneg, and hence no automated way to construct the Accept header. Rob
Re: [CODE4LIB] The lie of the API
On Sun, Dec 1, 2013 at 5:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. I'm not suggesting deviating from well supported standards, I'm suggesting choosing a different approach within the well supported standard that makes it easier for both consumer and producer. [Robert] You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. If we introduce languages into the negotiation, this won't scale. Sure, there's situations where the number of variants is so large that including them all would be a nuisance. The number of times this actually happens is (in my experience at least) vanishingly small. Again, I'm not suggesting an arbitrary API, I'm saying that there's easier ways to accomplish the 99% of cases than conneg. [Robert] This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. Don't know caches intimately, but I don't see why that's algorithmically difficult. Just look at the Content-type of the response. Is it harder for caches to examine headers than content or URI? (That's an earnest, perhaps naïve, question.) If we are talking about caching on the client here (not caching proxies), I would think in most cases requests are issued with the same Accept-* headers, so caching will work as expected anyway. I think Joe already discussed this one, but there's an outstanding conneg caching bug in firefox and it took even Squid a long time to implement the content negotiation aware caching. Also note, much harder not impossible :) No Conneg: * Check if we have the URI. Done. O(1) as it's a hash. Conneg: * Check if we have the URI. Parse the Accept headers from the request. Check if they match the cached content and don't contain wildcards. O(quite a lot more than 1) [Robert] Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Hadn't heard of these. (They are on Wikipedia so they must be real.) What do they offer over HTML link elements populated from the Dublin Core Element Set? Nothing :) They're link elements in a header so you can use them in non HTML representations. My whatever it's worth . great topic, though, thanks Robert :) Welcome :) Rob
Re: [CODE4LIB] The lie of the API
To be (more) controversial... If it's okay to require headers, why can't API keys go in a header rather than the URL. Then it's just the same as content negotiation, it seems to me. You send a header and get a different response from the same URI. Rob On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote: On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. //Ed
Re: [CODE4LIB] The lie of the API
(posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what Accept headers it will do string equality tests against. Rob On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be wrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and abuse of APIs in a digital libraries context, including a description of both good and bad practices from Europeana, DPLA and the Cooper Hewitt museum: http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/ Kind regards, Seth van Hooland Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC) Université Libre de Bruxelles Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles http://homepages.ulb.ac.be/~svhoolan/ http://twitter.com/#!/sethvanhooland http://mastic.ulb.ac.be 0032 2 650 4765 Office: DC11.102
Re: [CODE4LIB] Loris
Hi Andrew, Not exactly sure what sort of differences you're after... Do you mean the difference between this: http://iipimage.sourceforge.net/documentation/protocol/ (and it's 74 page reference: http://iipimage.sourceforge.net/IIPv105.pdf ) And this: http://www-sul.stanford.edu/iiif/image-api/1.1/ ? Rob On Fri, Nov 8, 2013 at 10:58 PM, Andrew Hankinson andrew.hankin...@gmail.com wrote: So what’s the difference between IIIF and IIP? (the protocol, not the server implementation) -Andrew On Nov 8, 2013, at 9:05 PM, Jon Stroop jstr...@princeton.edu wrote: It aims to do the same thing...serve big JP2s (and other images) over the web, so from that perspective, yes. But, beyond that, time will tell. One nice thing about coding against a well-thought-out spec is that are lots of implementations from which you can choose[1]--though as far as I know Loris is the only one that supports the IIIF syntax natively (maybe IIP?). We still have Djatoka floating around in a few places here, but, as many people have noted over the years, it takes a lot of shimming to scale it up, and, as far as I know, the project has more or less been abandoned. I haven't done too much in the way of benchmarking, but to date don't have any reason to think Loris can't perform just as well. The demo I sent earlier is working against a very large jp2 with small tiles[1] which means a lot of rapid hits on the server, and between that, (a little bit of) JMeter and ab testing, and a fair bit of concurrent use from the c4l community this afternoon, I feel fairly confident about it being able to perform as well as Djatoka in a production environment. By the way, you can page through some other images here: http://libimages.princeton.edu/osd-demo/ Not much of an answer, I realize, but, as I said, time and usage will tell. -Js 1. http://iiif.io/apps-demos.html 2. http://libimages.princeton.edu/loris/pudl0052%2F6131707%2F0001.jp2/info.json On 11/8/13 8:07 PM, Peter Murray wrote: A clarifying question: is Loris effectively a Python-based replacement for the Java-based djatoka [1] server? Peter [1] http://sourceforge.net/apps/mediawiki/djatoka/index.php?title=Main_Page On Nov 8, 2013, at 3:05 PM, Jon Stroop jstr...@princeton.edu wrote: c4l, I was reminded earlier this week at DLF (and a few minutes ago by Tom and Simeon) that I hadn't ever announced a project I've been working for the least year or so to this list. I showed an early version in a lightning talk at code4libcon last year. Meet Loris: https://github.com/pulibrary/loris Loris is a Python based image server that implements the IIIF Image API version 1.1 level 2[1]. http://www-sul.stanford.edu/iiif/image-api/1.1/ It can take JP2 (if you make Kakadu available to it), TIFF, or JPEG source images, and hand back JPEG, PNG, TIF, and GIF (why not...). Here's a demo of the server directly: http://goo.gl/8XEmjp And here's a sample of the server backing OpenSeadragon[2]: http://goo.gl/Gks6lR -Js 1. http://www-sul.stanford.edu/iiif/image-api/1.1/ 2. http://openseadragon.github.io/ -- Jon Stroop Digital Initiatives Programmer/Analyst Princeton University Library jstr...@princeton.edu -- Peter Murray Assistant Director, Technology Services Development LYRASIS peter.mur...@lyrasis.org +1 678-235-2955 800.999.8558 x2955
Re: [CODE4LIB] rdf serialization
Yes, I'm going to get sucked into this vi vs emacs argument for nostalgia's sake. From the linked, very outdated article: In fact, as far as I know I've never used an RDF application, nor do I know of any that make me want to use them. So what's wrong with this picture? a) Nothing. You would never know if you've used a CORBA application either. Or (insert infrastructure technology here) application. b) You've never been to the BBC website? You've never used anything that pulls in content from remote sites? Oh wait, see (a). c) I've never used a Topic Maps application. (and see (a)) I find most existing RDF/XML entirely unreadable Patient: Doctor, Doctor it hurts when I use RDF/XML! Doctor: Don't Do That Then. (aka #DDTT) Already covered in this thread. I'm a strong proponent of JSON-LD. I think that when we start to bring on board metadata-rich knowledge monuments such as WorldCat ... See VIAF in this thread. See, if you must, BIBFRAME in this thread. There /are/ challenges with RDF, not going to argue against that. And in fact I /have/ recently argued for it: http://www.cni.org/news/video-rdf-failures-linked-data-letdowns/ But for the vast majority of cases, the problems are solved (JSON-LD) or no one cares any more (httpRange14). Named Graphs (those quads used by crazies you refer to) solve the remaining issues, but aren't standard yet. They are, however, cleverly baked into JSON-LD for the time that they are. On Tue, Nov 5, 2013 at 2:48 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Ross Singer rossfsin...@gmail.com wrote: This is definitely where RDF outclasses almost every alternative*, Having said that, there's tuples of many kinds, it's only that the triplet is the most used under the W3C banner. Many are using to a more expressive quad, a few crazies , for example, even though that ad hominem? really? Your argument ceased to be valid right about here. may or may not be a better way of dealing with it. In the end, it all comes down to some variation over frames theory (or bundles); a serialisation of key/value pairs with some ontological denotation for what the semantics of that might be. Except that RDF follows the web architecture through the use of URIs for everything. That is not to be under-estimated in terms of scalability and long term usage. But wait, there's more! We haven't touched upon the next layer of the cake; OWL, which is, more or less, an ontology for dealing with all things knowledge and web. And it kinda puzzles me that it is not more often mentioned (or used) in the systems we make. A lot of OWL was tailored towards being a better language for expressing knowledge (which in itself comes from DAML and OIL ontologies), and then there's RDFs, and OWL in various formats, and then ... Your point? You don't like an ontology? #DDTT Complexity. The problem, as far as I see it, is that there's not enough expression and rigor for the things we want to talk about in RDF, but we don't want to complicate things with OWL or RDFs either. That's no more a problem of RDF than any other system. And then there's that tedious distinction between a web resource and something that represents the thing in reality that RDF skipped (and hacked a 304 solution to). It's all a bit messy. That RDF skipped? No, *RDF* didn't skip it nor did RDF propose the *303* solution. You can use URIs to identify anything. The 303/httprange14 issue is what happens when you *dereference* a URI that identifies something that does not have a digital representation because it's a real world object. It has a direct impact on RDF, but came from the TAG not the RDF WG. http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14 And it's not messy, it's very clean. What it is not, is pragmatic. URIs are like kittens ... practically free to get, but then you have a kitten to look after and that costs money. Thus doubling up your URIs is increasing the number of kittens you have. [though likely not, in practice, doubling the cost] * Unless you're writing a parser, then having a kajillion serializations seriously sucks. Some of us do. And yes, it sucks. I wonder about non-political solutions ever being possible again ... This I agree with. Rob
Re: [CODE4LIB] rdf serialization
You're still missing a vital step. Currently your assertion is that the creator /of a web page/ is Jefferson, which is clearly false. The page (...) is a transcription of the Declaration of Independence. The Declaration of Independence is written by Jefferson. Jefferson is Male. And it's not very hard given the right mindset -- its just a fully expanded relational database, where the identifiers are URIs. Yes, it's not 1st year computer science, but it is 2nd or 3rd year rather than post graduate. Which is not to say that people do not have great trouble succinctly articulating knowledge, but like any skill, it can be learned. Just look at the variation in the ways of writing papers ... some people can do it very clearly, some have much more difficulty. And with JSON-LD, you don't have to understand the RDF, just a clean representation of it. Rob On Sun, Nov 3, 2013 at 1:45 PM, Eric Lease Morgan emor...@nd.edu wrote: Cool input. Thank you. I believe I have tweaked my assertions: 1. The Declaration of Independence was written by Thomas Jefferson rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:dc=http://purl.org/dc/elements/1.1/; rdf:Description rdf:about= http://www.archives.gov/exhibits/charters/declaration_transcript.html; dc:creatorhttp://id.loc.gov/authorities/names/n79089957/dc:creator /rdf:Description /rdf:RDF 2. Thomas Jefferson is a male person rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:foaf=http://xmlns.com/foaf/0.1/; rdf:Description rdf:about=http://id.loc.gov/authorities/names/n7908995 foaf:Person foaf:gender=male / /rdf:Description /rdf:RDF Using no additional vocabularies (ontologies), I think my hypothetical Linked Data spider / robot ought to be able to assert the following: 3. The Declaration of Independence was written by Thomas Jefferson, a male person rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:dc=http://purl.org/dc/elements/1.1/; xmlns:foaf=http://xmlns.com/foaf/0.1/; rdf:Description rdf:about= http://www.archives.gov/exhibits/charters/declaration_transcript.html; dc:creator foaf:Person rdf:about= http://id.loc.gov/authorities/names/n79089957; foaf:gendermale/foaf:gender /foaf:Person /dc:creator /rdf:Description /rdf:RDF The W3C Validator…validates Assertion #3, and returns the attached graph, which illustrates the logical combination of Assertion #1 and #2. This is hard. The Semantic Web (and RDF) attempt at codifying knowledge using a strict syntax, specifically a strict syntax of triples. It is very difficult for humans to articulate knowledge, let alone codifying it. How realistic is the idea of the Semantic Web? I wonder this not because I don’t think the technology can handle the problem. I say this because I think people can’t (or have great difficulty) succinctly articulating knowledge. Or maybe knowledge does not fit into triples? — Eric Morgan University of Notre Dame [cid:6A4E613F-CE41-4D35-BDFA-2E66EE7AF20A]
[CODE4LIB] ANN: Memento Client for Chrome
Dear all, We are delighted to be able to announce the availability of the beta Memento extension for Chrome. The extension is available in the Chrome store: https://chrome.google.com/webstore/detail/memento/jgbfpjledahoajcppakbgilmojkaghgm?hl=engl=US Below, we include the description that accompanies the extension in the Chrome store, which highlights its web time travel and 404-circumventing features. Your feedback would be much appreciated to help us get it ready for prime time. We would like to take this opportunity to thank: - Harihar Shankar for the effort he invested in developing the extension. - Luydmila Balakireva, Martin Klein, Michael Nelson, James Powell, for their input during the development process. Many thanks, Rob Sanderson and Herbert Van de Sompel, Los Alamos National Laboratory == Description Travel to the past of the web by right-clicking pages and links. Memento for Chrome allows you to seamlessly navigate between the present web and the web of the past. It turns your browser into a web time travel machine that is activated by means of a Memento sub-menu that is available on right-click. First, select a date for time travel by clicking the black Memento extension icon. Now right-click on a web page, and click the Get near … option from the Memento sub-menu to see what the page looked like around the selected date. Do the same for any link in a page to see what the linked page looked like. If you hit one of those nasty Page not Found errors, right-click and select the Get near current time option to see what the page looked like before it vanished from the web. When on a past version of a page - the Memento extension icon is now red - right-click the page and select the Get current time option to see what it looks like now. Memento for Chrome obtains prior versions of pages from web archives around the world, including the massive web-wide Internet Archive, national archives such as the British Library and UK National Archives web archives, and on-demand web archives such as archive.is. It also allows time travel in all language versions of Wikipedia. There's two things Memento for Chrome can not do for you: obtain a prior version of a page when none have been archived and time travel into the future. Our sincere apologies for that. Technically, the Memento for Chrome extension is a client-side implementation of the Memento protocol that extends HTTP with content negotiation in the date time dimension. Many web archives have implemented server-side support for the Memento protocol, and, in essence, every content management system that supports time-based versioning can implement it. Technical details are in the Memento Internet Draft at http://www.mementoweb.org/guide/rfc/ID/. General information about the protocol, including a quick introduction, is available at http://mementoweb.org.
Re: [CODE4LIB] anti-harassment policy for code4lib?
+1, of course :) You might wish to consider some further derivatives/related pages: http://www.diglib.org/about/code-of-conduct/ http://wikimediafoundation.org/wiki/Friendly_space_policy https://thestrangeloop.com/about/policies http://www.apache.org/foundation/policies/anti-harassment.html Rob On Mon, Nov 26, 2012 at 3:57 PM, Mariner, Matthew matthew.mari...@ucdenver.edu wrote: +1 for all of the below Matthew C. Mariner Head of Special Collections and Digital Initiatives Assistant Professor Auraria Library 1100 Lawrence StreetDenver, CO 80204-2041 matthew.mari...@ucdenver.edu http://library.auraria.edu :: http://archives.auraria.edu On 11/26/12 3:51 PM, Tom Cramer tcra...@stanford.edu wrote: +1 for Bess's motion +1 for Roy's expansion to C4L online interactions as well as face to face +1 for Karen's focus on general inclusivity and fair play For me the hardest thing is how one monitors and resolves issues that arise. As a group with no formal management, I suppose the conference organizers become the deciders if such a necessity arises. If it's elsewhere (email, IRC) -- that's a bit trickier. The Ada project's detailed guides should help, but if there is a policy it seems that there necessarily has to be some responsible body -- even if ad hoc. It seems to me that there would be tremendous benefit in having 1.) an explicit statement of the community norms around harassment and fair play in general. In the best case, this would help avoid uncomfortable or inappropriate situations before they occur. 2.) a defined process for handling any incidents that do arise, which in the case of this community I would imagine would revolve around reporting, communication, negotiation and arbitration rather than adjudication by a standing body (which I agree is hard to see in this crowd). I know several high schools have adopted peer arbitration networks for conflict resolution rather than referring incidents to the Principal's Office--perhaps therein lies a model for us for any incidents that may not be resolved simply through dialogue. - Tom On Nov 26, 2012, at 2:32 PM, Karen Coyle wrote: Bess and Code4libbers, I've only been to one c4l conference and it was a very positive experience for me, but I also feel that this is too valuable of a community for us to risk it getting itself into crisis mode over some unintended consequences or a bad apple incident. For that reason I would support the adoption of an anti-harassment policy in part for its consciousness-raising value. Ideally this would be not only about sexual harassment but would include general goals for inclusiveness and fair play within the community. And it would also serve as an acknowledgment that none of us is perfect, but we can deal with it. For me the hardest thing is how one monitors and resolves issues that arise. As a group with no formal management, I suppose the conference organizers become the deciders if such a necessity arises. If it's elsewhere (email, IRC) -- that's a bit trickier. The Ada project's detailed guides should help, but if there is a policy it seems that there necessarily has to be some responsible body -- even if ad hoc. kc On 11/26/12 2:16 PM, Bess Sadler wrote: Dear Fellow Code4libbers, I hope I am not about to get flamed. Please take as context that I have been a member of this community for almost a decade. I have contributed software, support, and volunteer labor to this community's events. I have also attended the majority of code4lib conferences, which have been amazing and life-changing, and have helped me do my job a lot better. But, and I've never really known how to talk about this, those conferences have also been problematic for me a couple of times. Nothing like what happened to Noirin Shirley at ApacheCon (see http://geekfeminism.wikia.com/wiki/Noirin_Shirley_ApacheCon_incident if you're unfamiliar with the incident I mean) but enough to concern me that even in a wonderful community where we mostly share the same values, not everyone has the same definitions of acceptable behavior. I am watching the toxic fallout from the BritRuby conference cancellation with a heavy heart (go search for britruby conference cancelled if you want to catch up and/or get depressed). It has me wondering what more we could be doing to promote diversity and inclusiveness within code4lib. We have already had a couple of harassment incidents over the years, which I won't rehash here, which have driven away members of our community. We have also had other incidents that don't get talked about because sometimes one can feel that membership in a community is more important than one's personal boundaries or even safety. We should not be a community where people have to make that choice. I would like for us to consider adopting an anti-harassment policy for code4lib conferences. This is emerging
Re: [CODE4LIB] Code4lib 2013 Presentation Election now open!
I guess that you need to be logged in to vote? Perhaps a direct link in the text to where to login, and where to request a new account? Thanks, Rob On Tue, Nov 13, 2012 at 11:15 AM, Becky Yoose b.yo...@gmail.com wrote: Not a voting problem per se, but the results page in IE9 [1] in Win7 threw up up everywhere: http://screencast.com/t/lUnwFl8h Otherwise, yay new design :cD Thanks, Becky [1] Related: don't ask why I was in IE. On Mon, Nov 12, 2012 at 11:03 PM, Ross Singer rossfsin...@gmail.com wrote: http://vote.code4lib.org/election/24 Vote early, vote often, but most importantly, vote soon: the polls close sometime on the night of Monday the 19th of November (looking at the host that the diebold-o-tron, I think it will be around 11 PM EST, but when they close, they close!). -Ross. p.s. given the new design, let me know if there are any voting problems.
Re: [CODE4LIB] Embedding XHTML into RDF
+1 Rob On Thu, Jan 12, 2012 at 9:26 AM, aj...@virginia.edu aj...@virginia.edu wrote: My inclination would be to keep the descriptive snippets in some kind of content store with a good RESTful Web exposure and just use those URLs as the values of description triples in your RDF. Then your RDF is genteel Linked Data and your XHTML can be easily available to integrating services. --- A. Soroka Online Library Environment the University of Virginia Library On Jan 11, 2012, at 11:00 PM, CODE4LIB automatic digest system wrote: From: Ethan Gruber ewg4x...@gmail.com Date: January 11, 2012 3:07:16 PM EST Subject: Re: Embedding XHTML into RDF People are going to use the YUI rich text editor and the output is run through tidy, so that should ensure the well-formedness of the HTML. Right now we have a system where thousands of small XHTML fragments exist as text files in a filesystem (edited manually, practically), which are rendered through wiki software. The fragments have RDFa attributes so that an RDFa python script can interpret wiki pages as RDF on the fly. We need to redesign the system from the ground up, and I'd like to use RDF as the source object. Ethan
Re: [CODE4LIB] Embedding XHTML into RDF
You might consider the Content in RDF specification: http://www.w3.org/TR/Content-in-RDF10/ which describes how to do this in a generic fashion, as opposed to stuffing it directly into a string literal. HTH Rob On Wed, Jan 11, 2012 at 12:36 PM, Ethan Gruber ewg4x...@gmail.com wrote: Hi all, Suppose I have RDF describing an object, and I would like some fairly free-form human generating description about the object (let's say within dcterms:description). Is it semantically acceptable to have XHTML nested directly in this element or would this be considered uncouth for LOD? Thanks, Ethan
Re: [CODE4LIB] Sending html via ajax -vs- building html in js (was: jQuery Ajax request to update a PHP variable)
On Thu, Dec 8, 2011 at 9:14 AM, BRIAN TINGLE brian.tingle.cdlib@gmail.com wrote: On Dec 7, 2011, at 2:19 PM, Robert Sanderson wrote: * Lax Security -- It's easier to get into trouble when you're simply inlining HTML received, compared to building the elements. Getting into the same bad habits as SQL injection. It might not be a big deal now, but it will be later on. I've been scratching my head about this one. Can someone elaborate on this? If you blindly include whatever you get back directly into the page, it might include either badly performing, out of date, or potentially malicious script tags that subsequently destroy the page. It's the equivalent of blindly accepting web form input into an SQL query and then wondering where your tables all disappeared off to. Rob
Re: [CODE4LIB] Sending html via ajax -vs- building html in js (was: jQuery Ajax request to update a PHP variable)
Here's some off the top of my head: * Separation of concerns -- You can keep your server side data transfer and change the front end easily by working with the javascript, rather than reworking both. * Lax Security -- It's easier to get into trouble when you're simply inlining HTML received, compared to building the elements. Getting into the same bad habits as SQL injection. It might not be a big deal now, but it will be later on. * Obfuscation -- It's easier to debug one layer of code rather than two at once. It's thus also easier to maintain the two layers of code, and easier to see at which end the system is failing. Rob On Wed, Dec 7, 2011 at 3:12 PM, Jonathan Rochkind rochk...@jhu.edu wrote: A fair number? Anyone but Godmar? On 12/7/2011 5:02 PM, Nate Vack wrote: OK. So we have a fair number of very smart people saying, in essence, it's better to build your HTML in javascript than send it via ajax and insert it. So, I'm wondering: Why? Is it an issue of data transfer size? Is there a security issue lurking? Is it tedious to bind events to the new / updated code? Something else? I've thought about it a lot and can't think of anything hugely compelling... Thanks! -Nate
Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community
LibLime A Division of PTFS, Inc. Main Office 11501 Huff Court North Bethesda, Maryland 20895 tel: (301) 654-8088 Ext. 127 fax: (301) 654-5789 email: kohai...@liblime.com Twitter: @liblime How about we all contact them? ;) Rob 2011/11/23 Wilfred Drew dr...@tc3.edu: Has anybody contacted the company? A sales rep? PR department? Bill Drew -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Parker, Anson (adp6j) Sent: Wednesday, November 23, 2011 12:09 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community This is pretty offensive on the liblime part, perhaps not surprising, but certainly low browŠ I think best practices are to 1) blog it up 2) get a list of their clients and email them all to let them know what a bunch of schmarmy brats they are working withŠ make it hurt financially. It's not libel or slander as long as it is true. Going to New Zealand to play a legal game like this is way below the belt. -ap On 11/23/11 10:32 AM, Eric Lease Morgan emor...@nd.edu wrote: Just to make it easier, use the following link to read the article and then donate via PayPal -- http://bit.ly/rBeWN0 Open source software is about liberty, not gratis. --ELM
Re: [CODE4LIB] OIA Feeds
Without /any/ infrastructure it would be a challenge, but a simple database that has timestamps and basic metadata would be sufficient. The timestamps are the most important, obviously, to populate the feed correctly and handle the time slicing. Rob On Tue, Jun 21, 2011 at 8:55 AM, Eric Lease Morgan emor...@nd.edu wrote: On Jun 21, 2011, at 9:50 AM, Nathan Tallman wrote: Can anyone direct me towards documentation on creating an OAI feed from scratch, without a repository infrastructure? Setting up an OAI feed -- becoming an OAI data provider -- without a repository infrastructure would be challenging, to say the least. To learn more one would need to first read the OAI-PMH specification. [1] You would then need to write a program to support the OAI verbs (identify, listSets, etc.). All of the metadata in the resulting feed would need to come from some place, and this place is usually a database of some sort. You have a database listing your content, right? [1] specification - http://bit.ly/dJyAE3 -- Eric Lease Morgan University of Notre Dame Great Books Survey -- http://bit.ly/auPD9Q
[CODE4LIB] New Memento Internet Draft available; comments requested!
Dear all, We have published an updated internet draft for the Memento specification concerning Time Travel on the Web. It is available at: * TXT version: http://www.ietf.org/id/draft-vandesompel-memento-01.txt * HTML version: http://mementoweb.org/guide/rfc/ID/ This version contains updates and clarifications that result from community feedback, as well as new material pertaining to the discovery of TimeGates, TimeMaps, and Mementos. This is the technical specification behind our recent article in the Code4Lib journal on client development: http://journal.code4lib.org/articles/4979 We would be very appreciative of any feedback that the community might have on this specification, either on-list or privately. We have a Memento google group if you would like to participate in an ongoing conversation on this topic: http://groups.google.com/group/memento-dev Many thanks, and we look forwards to hearing your comments, Rob Sanderson, Herbert Van de Sompel, Michael Nelson.
Re: [CODE4LIB] Fwd: [Air-L] Using archives of the web for research
Our work on Memento comes to mind, of course. http://www.mementoweb.org/ And in particular, regarding the second point, our papers about the use of Memento for non-traditional interactions with web archives: * http://arxiv.org/abs/1003.3661 Using Memento to recover the state of a web resource at the point in time it was annotated, to ensure that the annotation is displayed with the correct representation. * http://arxiv.org/abs/1003.2643 Using Memento with Linked Data to perform time series analysis. And hopefully a paper at Open Repositories, describing initial and ongoing research, briefly summarized in: * http://public.lanl.gov/herbertv/papers/Papers/2011/MementoPoster_IKS_201104.pdf Hope that helps! Rob On Thu, Apr 21, 2011 at 8:58 AM, Jodi Schneider jschnei...@pobox.com wrote: Code4Lib, any thoughts for Eric? -Jodi -- Forwarded message -- From: Eric Meyer eric.me...@oii.ox.ac.uk Date: Wed, Apr 20, 2011 at 4:46 PM Subject: [Air-L] Using archives of the web for research To: ai...@listserv.aoir.org ai...@listserv.aoir.org Cc: Ralph Schroeder ralph.schroe...@oii.ox.ac.uk, a...@proteus-associates.com a...@proteus-associates.com Dear AoIR, OII is currently doing some work for the IIPC (International Internet Preservation Consortium: http://www.netpreserve.org), and part of the work involves identifying current and cutting edge research techniques and tools that are available for research on the live web, but that are currently either difficult or impossible to use with web archives such as the Internet Archive (http://www.archive.org/) or other IIPC member organisations. The short version of what we are hoping to get from this group: - do you know of any innovative uses of web archives for research? - what techniques for researching the live web should be adapted for use with web archives? - can you envisage any innovative uses of web archives (or other archived Internet data) for research that you would ideally like to be able to do? The longer version: What we are hoping is that members of AoIR will respond to us (off list) with your ideas about ways you research the live web that could potentially be enhanced using either snapshots from the web at different time points or longitudinal data about the web over time, but which would need additional support, training, tools, or infrastructure to be able to accomplish. Your responses will be used to influence the IIPC community to add web archive support for the kinds of cutting edge research that AoIR members are doing. Also, if you have any types of research or research questions you have been hoping to be able to do with archived internet data but have not been able to do for whatever reason, and you are willing to share the ideas and the barriers to researching them with us for possible inclusion in our discussion paper, that would be appreciated as well. Responses before 1 May will be most helpful. We will post the draft report back to the list in May, and the final report in the summer. Those interested in web archives may also find two reports we wrote last autumn to be of interest: Dougherty, M., Meyer, E.T., Madsen, C., van den Heuvel, C., Thomas, A., Wyatt, S. (2010). Researcher Engagement with Web Archives: State of the Art. London: JISC. Online: http://ssrn.com/abstract=1714997 or http://ie-repository.jisc.ac.uk/544/ Thomas, A., Meyer, E.T., Dougherty, M., van den Heuvel, C., Madsen, C., Wyatt, S. (2010). Researcher Engagement with Web Archives: Challenges and Opportunities for Investment. London: JISC. Online: http://ssrn.com/abstract=1715000 or http://ie-repository.jisc.ac.uk/543/ Eric T. Meyer Research Fellow, Oxford Internet Institute University of Oxford eric.me...@oii.ox.ac.uk http://people.oii.ox.ac.uk/meyer ___ The ai...@listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org Join the Association of Internet Researchers: http://www.aoir.org/
[CODE4LIB] Fwd: [dm-l] Postdoctoral Fellowship at MARGOT, University of Waterloo
-- Forwarded message -- From: Christine McWebb cmcw...@uwaterloo.ca Date: Mon, Apr 18, 2011 at 7:36 AM Subject: [dm-l] Postdoctoral Fellowship at MARGOT, University of Waterloo To: d...@uleth.ca University of Waterloo – Mellon Postdoctoral Fellowship in Digital Humanities With apologies for cross-posting; please redistribute: Postdoctoral Fellowship at MARGOT The MARGOT Annotation Tool project (imageMAT), funded by the Andrew W. Mellon Foundation – Scholarly Communications and Technology Program (2011-2012), invites applications to its 2011 competition for a postdoctoral fellowship. imageMAT offers a one-year postdoctoral fellowship valued at $31,500 + 14% vacation pay and benefits to PhD students in the final year of their program and recent graduates. Applicants must have knowledge in medieval iconography and/or literature and manuscript culture/production. Applicants must also have solid computer skills. The postdoctoral fellow will provide scholarly leadership and, more generally, add scholarly content to the project site such as manuscript descriptions and blog posts. He/she will consult on content creation, and assist the developer and McWebb with the training of graduate students in content creation and be responsible for site moderation. Knowledge of French would be an asset, but is not required. The award is tenable at the University of Waterloo, Waterloo, Ontario, and is supervised by Christine McWebb. The start date is September 1, 2011. Applicants must not hold a tenure or tenure-track position or other full-time employment. Fellows are expected to engage in full-time postdoctoral research during the term of the award. Preference will be given to recent graduates, that is, to graduates applying within five years of receiving their doctoral degree. The awards are not renewable beyond the first year. Please send a cover letter, current c.v., and the names of three referees by email to: Christine McWebb cmcw...@uwaterloo.ca Application deadline: 1 June, 2011 Christine McWebb Associate Professor Associate Chair, Graduate Studies Département d'études françaises ML 337 University of Waterloo 200 University Avenue Waterloo, Ontario N2L 3G1 Canada T.: 519-888-4567x32426 http://margot.uwaterloo.ca Digital Medievalist -- http://www.digitalmedievalist.org/ Journal: http://www.digitalmedievalist.org/journal/ Journal Editors: editors _AT_ digitalmedievalist.org News: http://www.digitalmedievalist.org/news/ Wiki: http://www.digitalmedievalist.org/wiki/ Twitter: http://twitter.com/digitalmedieval Facebook: http://www.facebook.com/group.php?gid=49320313760 Discussion list: d...@uleth.ca Change list options: http://listserv.uleth.ca/mailman/listinfo/dm-l
[CODE4LIB] Fwd: OAC RFP Annoncement
Forwarded: The Open Annotation Collaboration (OAC) project is pleased to announce a Request For Proposal to collaborate with OAC researchers for building implementations of the OAC data model and ontology. The OAC is seeking to collaborate with scholars and/or librarians currently using and/or curating established repositories of scholarly digital resources with well-defined audiences of scholars. The OAC intends to fund a set of four projects that are complementary in content media type and use cases that leverage the OAC Data Model to the fullest extent, and that leverage existing annotation tools or at least have articulated an interesting scholarly annotation use case. Two of the successful Respondents will collaborate with OAC researchers at the University of Maryland and the other two will collaborate with OAC research at the University of Illinois at Urbana-Champaign. (For these collaborations, Illinois and Maryland will provide guidance on the implementation of the OAC data model and ontology, help in defining extensions of the data model that might be necessary, advice on existing tools that might be adaptable for the demonstration experiment, feedback on correctness of mappings from/to native annotation formats and/or annotations created.) The full text of the RFP can be found at http://www.openannotation.org/documents/openAnnotationRFP.pdf The IP agreement attachment to this RFP is available at: http://www.openannotation.org/documents/openAnnotationIP_Agreement_forRFP.pdf A FAQ about this RFP is available at: http://www.openannotation.org/RFP_FAQs.html Please make all submissions regarding this RFP, including your letter of intent and proposal, to oac2...@support.lis.illinois.edu Questions: regarding any details of this RFP should also be emailed to oac2...@support.lis.illinois.edu; answers to substantive questions from individuals will be posted immediately on the RFP FAQ page mentioned above (so as to available to all proposers). The Open Annotation Collaboration is supported by a grant from the Andrew W. Mellon Foundation. OAC members include the University of Illinois at Urbana-Champaign, the University of Maryland, the University of Queensland (Australia), and the Los Alamos National Laboratory. Regards, Jacob Jett Assistant Coordinator, Open Annotation Collaboration Project Center for Informatics Research in Science and Scholarship The Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
Re: [CODE4LIB] XML Schema vs Library APIs (OAI-PMH/SRU/unAPI)
That is (still) incorrect. A single schema may contain multiple namespaces, and there isn't a unique identifier for a schema. For example, any simple Dublin Core based syntax must have at least two Namespaces, Dublin Core and the wrapper element. SchemaLocation is not unique as there can be many copies of the same schema. A single schema may define multiple root elements, such as MODS does with both item and collection level elements. Referring to your blog post, you can say how the four inter-relate: Schema Identifier uniquely identifies the format. Schema Location is a non-unique description of the format. Schema Name is a short, human readable, non-unique name for the format and Namespace is a non-unique namespace used by the format. This is just a rehash of a previous discussion on this list, between us: http://www.mail-archive.com/code4lib@listserv.nd.edu/msg05309.html So I guess I'm wasting my time ;) Rob Sanderson On Thu, Feb 24, 2011 at 9:44 AM, Jakob Voss jakob.v...@gbv.de wrote: Hi, We are developing a general API management tool to provide different APIs (unAPI, SRU, OAI-PMH...) with different record formats (MARC, MODS, DC...) to our databases. We now stumbled upon some confusion regarding XML formats. The basic question is what is a format and how do you refer to it? I came to the conclusion that at least SRU schema identifiers are useless. In addition you can extract XML namespace URIs from XML Schemas, so all you need to identify a format is a link to its XML Schema. I wrote a more detailed blog posting about this at http://jakoblog.de/2011/02/24/xml-schema-vs-library-apis-oai-pmhsruunapi/ Does anyone of you relies on SRU schema identifiers when consuming SRU? I think at least for XML-based formats we should only use the XML Schema as authoritative reference. Sure there are different applications of variants of one schema, but then it makes no sense to use global identifiers in addition to local names. Jakob -- Jakob Voß jakob.v...@gbv.de, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
[CODE4LIB] Using OAC Workshop Planned for March 2011 (CFP)
Dear all, The Open Annotation Collaboration (OAC) project is pleased to announce an open call for statements of interest in participating in the Using the OAC Model for Annotation Interoperability Workshop. The workshop will be held 24-25 March 20011 in Chicago, IL and will provide an in- depth introduction to the OAC data model and ontology for describing scholarly annotations of Web-accessible information resources. Use cases involving a range of scholarly annotation classes and target media types will be presented. Participants will be asked to examine, comment on, and provide feedback on how well the OAC data model and framework intersects (or fails to intersect) with domain-specific needs for scholarly annotation services and with existing discipline or repository-specific annotation tools and services. By the end of the day and a half workshop, attendees will be better prepared to propose and undertake implementations of annotation tools and services exploiting the OAC data model and ontology. The workshop is planned for 9 AM March 24 through 1 PM March 25, 2011, in Chicago, Illinois. Limited support is available to reimburse invited participants for reasonable travel costs. Preliminary statements of interest use case briefs are due by January 24, 2011. In the event of oversubscription, these briefs will be used to select invitees; invitations will be issued by February 7. Please see http://www.openannotation.org/documents/CallForWorkshopParticipation.pdf for additional details and context; contact Tim Cole (t- co...@illinois.edu) or Jacob Jett (jje...@illinois.edu) for further information. The Open Annotation Collaboration is supported by a grant from the Andrew W. Mellon Foundation. OAC members include the University of Illinois at Urbana-Champaign, the University of Maryland, the University of Queensland (Australia), and the Los Alamos National Laboratory.
Re: [CODE4LIB] graph processing stack
On Mon, Dec 20, 2010 at 1:28 PM, BRIAN TINGLE brian.tingle.cdlib.org@ gmail.com wrote: graph processing stack on top of a graph database resonates with me more than RDF store with SPARQL access but I guess they are basically/functionally saying the same thing? Maybe the graph database way of thinking about it is potentially less interoperable open data linking way? -- but I've always believed you have to operate before you can interoperate. An RDF Triple Store is a specific type of graph database, and SPARQL is a specific way to access it. Neo4J is another type of graph database, and the Gremlin/Pipes/Blueprints/Rexster stack is a way to access it. At the heart of the matter is how you model your graph, and RDF is the standard way to do that. You can store RDF in Neo4J and it has a SPARQL interface. In terms of operating and interoperating, it would seem to me that the easiest way forwards is to ignore any RDF ontologies you don't understand and simply create new relationships, such as snac:correspondedWith and snac:associatedWith ... you or other people can assert equivalencies later :) HTH, Rob Sanderson
Re: [CODE4LIB] What do you want out of a frbrized data web service?
Exposing the records as Linked Data, rather than just plain old XML would be an interesting demonstration of how the library world can generate and, more importantly, curate massive amounts of data. They could then be linked to and from by other resources/services -- for example linking a copy of a book on Amazon as an Item to the Manifestation it's drawn from could allow for powerful graph oriented search. Rob On Tue, Apr 20, 2010 at 3:50 PM, Riley, Jenn jenlr...@indiana.edu wrote: Hi all, At Indiana University we're working on a project that will help us see concretely what FRBRized [1] library data and discovery systems might look like. [2] One of our project goals is to share the raw FRBRized data widely so that others can look at it to see how it's structured, reuse it, improve on it, comment on the FRBRization effectiveness, etc. We're planning on allowing remote/Web Services/API/SRU/some machine-to-machine method like that access to the data. As we're starting to think about how we should set that up, we thought it would be useful to gather some use cases from the code4lib community, as it's the folks here that are experimenting with services like this. So if there were FRBRized data available to you (at least for FRBR group 1 and group 2 entities; *maybe* group 3 as well), what would you do with it? What kinds of questions would your service (discovery system, whatever) ask a service that made this data available? What kinds of information would you want in a response? Would you have uses that called for downloading of all data at once or would you instead be better off with real-time queries to a web service? It's questions like that we're interested in brainstorming with this group about. Basically, what type of access to the data we're generating is most important, since we have finite resources to expend on this right now. Thanks, all! Jenn [1] http://www.loc.gov/cds/downloads/FRBR.PDF [2] http://vfrbr.info Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
Depends on the sort of features required, in particular the access patterns, and the hardware it's going to run on. In my experience, NoSQL systems (for example apache's Cassandra) have extremely good distribution properties over multiple machines, much better than SQL databases. Essentially, it's easier to store a bunch of key/values in a distributed fashion, as you don't need to do joins across tables (there aren't any) and eventually consistent systems (such as Cassandra) don't even need to always be internally consistent between nodes. If many concurrent write accesses are required, then NoSQL can also be a good choice, for the same reasons as it's easily distributed. And for the same reasons, it can be much faster than SQL systems with the same data given a data model that fits the access patterns. The flip side is that if later you want to do something that just requires the equivalent of table joins, it has to be done at the application level. This is going to be MUCH MUCH slower and harder than if there was SQL underneath. Rob On Mon, Apr 12, 2010 at 7:55 AM, Thomas Dowling tdowl...@ohiolink.edu wrote: So let's say (hypothetically, of course) that a colleague tells you he's considering a NoSQL database like MongoDB or CouchDB, to store a couple tens of millions of documents, where a document is pretty much an article citation, abstract, and the location of full text (not the full text itself). Would your reaction be: That's a sensible, forward-looking approach. Lots of sites are putting lots of data into these databases and they'll only get better. This guy's on the bleeding edge. Personally, I'd hold off, but it could work. Schedule that 2012 re-migration to Oracle or Postgres now. Bwahahahah!!! Or something else? (http://en.wikipedia.org/wiki/NoSQL is a good jumping-in point.) -- Thomas Dowling tdowl...@ohiolink.edu
[CODE4LIB] Memento Updates
*** Apologies for cross-posting *** We are excited to share some news about the Memento (Time Travel for the Web) effort. Memento proposes to extend HTTP with datetime content negotiation as a means to better integrate the present and past Web. The Memento effort is partly funded by the Library of Congress. = The MementoFox add-on for FireFox browsers has been released. It allows time travel on the Web in a manner compliant with the Memento framework. * The MementoFox add-on can be downloaded at https://addons.mozilla.org/en-US/firefox/addon/100298. * Suggested Web time travels that can be undertaken using the add-on are described at http://www.mementoweb.org/demo/. They involve navigations for both the document Web and the Linked Data cloud. = There is also a Memento plug-in available for the MediaWiki platform. The plug-in provides support for Memento-style navigation of a Wiki's history pages. * The MediaWiki plug-in can be downloaded at http://www.mediawiki.org/wiki/Extension:Memento. * If you run a MediaWiki platform, please install this plug-in and let us know the URI of your Wiki. = Further pointers for recent Memento developments: * Memento site http://www.mementoweb.org. * Since Memento was first announced in November 2009, improvements have been made to the technical framework. Most notably, all of the concerns related to Web caching have been addressed such that the framwork now takes maximal advantage of the existing caching infrastructure. Overviews of the framework are available via http://www.mementoweb.org/guide/. * Some major Web Archives have started working towards Memento support. See http://www.mementoweb.org/events/IA201002/. We are very interested in your feedback. Discussions are welcomed on the Memento list at http://groups.google.com/group/memento-dev/. On behalf of the Memento team: Herbert Van de Sompel - Los Alamos National Laboratory Michael L. Nelson - Old Dominion University Robert Sanderson - Los Alamos National Laboratory