Re: [CODE4LIB] COinS
Funny this topic comes up right now. A few days ago, Wikipedia (arguably the biggest provider of COiNS) decided to discontinue it because they've discovered that generating the COinS using their decrepit infrastructure uses up so much processing power that attempts to edit pages with lots of citations time out. See [1, 2]. That said, there is some movement to restore them once they get their act together and improve their infrastructure. The big irony is that this move was driven by editors and regular contributors (it doesn't affect anyone not signed into Wikipedia) that is, exactly those users who *ought* to make the most regular use of COinS to actually retrieve cited material... Just by coincidence, we finally engaged on a project to better process COinS. As is, we're just linking to the OpenURL resolver, which is hit and miss - that said, it's a facility that's used. We're now keeping statistics, and for just 10 editions we've had over 5,000 clicks in the last three month alone. But we have additional options - Link/360 being one for SS clients, and Summon another. We think we can do a much better job at resolving COinS with a combination of these services. None of this depends on the specific COinS format, of course - any suitable microformat would work, too. - Godmar [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=19262 [2] https://en.wikipedia.org/wiki/Template_talk:Citation/core#Disappointed On Tue, Nov 20, 2012 at 4:47 PM, Bigwood, David dbigw...@hou.usra.eduwrote: I've used the COinS Generator at OCLC for years. Now it is gone. Any suggestions on how I can get an occasional COinS for use in our bibliography? Do any of the citation managers generate COinS? Or is this just an old unused metadata format that should be replaced by something else? Thanks, Dave Bigwood dbigw...@hou.usra.edu Lunar and Planetary Institute
Re: [CODE4LIB] COinS
Could you elaborate on your belief that COinS is actually illegal in HTML5? Why would that be so? - Godmar On Tue, Nov 20, 2012 at 5:20 PM, Jonathan Rochkind rochk...@jhu.edu wrote: It _IS_ an old unused metadata format that should be replaced by something else (among other reasons because it's actually illegal in HTML5), but I'm not sure there is a something else with the right balance of flexibility, simplicity, and actual adoption by consuming software. But COinS didn't have a whole lot of adoption by consuming software either. Can you say what you think the COinS you've been adding are useful for, what they are getting used for? And what sorts of 'citations' youw ere adding them for? For my own curiosity, and because it might help answer if there's another solution that would still meet those needs. But if you want to keep using COinS -- creating a COinS generator like OCLC's no longer existing one is a pretty easy thing to do, perhaps some code4libber reading this will be persuaded to find the time to create one for you and others. If you have a server that could host it, you could offer that. :) On 11/20/2012 4:47 PM, Bigwood, David wrote: I've used the COinS Generator at OCLC for years. Now it is gone. Any suggestions on how I can get an occasional COinS for use in our bibliography? Do any of the citation managers generate COinS? Or is this just an old unused metadata format that should be replaced by something else? Thanks, Dave Bigwood dbigw...@hou.usra.edu Lunar and Planetary Institute
Re: [CODE4LIB] Book metadata source
If it's only in the hundreds, why not just look them up in Worldcat via their basic search API and pull the ISBNs from the xISBN service? That's quickly scripted. - Godmar On Thu, Oct 25, 2012 at 3:05 PM, Cab Vinton bibli...@gmail.com wrote: I have a list of several hundred book titles corresponding authors, comprising our State Library's book group titles, am looking for ways of putting these titles online in a way that would be useful to librarians patrons. Something along the lines of a LibraryThing collection or Amazon wishlist. Without ISBNs, however, the process could be very labor-intensive. Any suggestions for how we could handle this as part of a batch process? I realize that different manifestations of the same work will have different ISBNs, so we'd be seeking any work in print format, ideally the most commonly held. The only thought I've had is to do a Z39.50 search using the author title Bib-1 attributes EG @and @attr 1=4 mansfield @attr 1=1003 austen. Thanks for your thoughts, Cab Vinton, Director Sanbornton Public Library Sanbornton, NH
Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)
On Wed, Oct 24, 2012 at 12:16 PM, Jonathan Rochkind rochk...@jhu.eduwrote: Looking at the major 'discovery' products, Summon, Primo, EDS ...all three will provide some results to un-authenticated users (the general public), but have some portions of the corpus that are restricted and won't show up in your results unless you have an authenticated user affiliated with customer's organization. I brought this issue up on the Summon clients mailing list a few weeks ago. My impression from the resulting reaction was that people do not appear to be overly concerned about it, because a) most queries come from on-campus b) the only results missing are those that come from re-published AI databases (which don't allow unauthenticated access), which is a minority of content when compared to what is indexed by Summon itself c) there's an option Use Off Campus Sign In to access full text and more content users can use to avoid the problem. Personally, I think it's little known, and insufficiently presented to the user (more content). The key problem is that as libraries are increasingly offering their discovery systems as OPAC replacements, users accustomed to the conventions used in OPAC do not expect this difference in behavior. OPACs generally show the same results independent of the user's authentication status, and do not require authentication just to search. - Godmar
Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)
On Wed, Oct 24, 2012 at 1:54 PM, Mark Mounts mark.mou...@dartmouth.eduwrote: We have Summon at Dartmouth College. Authentication is IP based so with a Dartmouth IP address the user will see all our licensed content. There is also the option to see all the content Summon has beyond what we license by selecting the option Add results beyond your library's collection That's according to my understanding not what Jonathan is talking about. You can select Add results beyond your library's collection while being unauthenticated/off-campus, but this still won't show you the same results. The results that are never displayed to unauthenticated users are those Summon republishes from AI databases. Add results beyond your library's collection just adds (public) results from the holdings of other libraries; it doesn't add AI results. - Godmar
Re: [CODE4LIB] Q.: software for vendor title list processing
Thanks for everyone who replied to my question. From a brief examination, if I understand it correctly, KBART and ONIX create normative standards for how holdings data should be represented, which vendors (increasingly) follow. This leads to three follow-up questions. First, is there software to translate/normalize existing vendor lists from vendors that have not yet adopted either of these standards into these formats? I'm thinking of a collection of adapters or converters, perhaps. Each would likely constitute small effort, but there would be benefits from sharing development and maintenance. Second, if holdings lists were provided in, or converted to, for instance the KBART format, what software understands these formats to further process them? In other words, is there immediate bang for the buck of adopting these standards? Third, unsurprisingly, these efforts arose in the managements of serials because holdings there change frequently depending on purchase agreements, etc. It is my understanding that eBooks are now posing similar collection management challenges. Are there separate normative efforts for eBooks or is it believed that efforts such as KBART/ONIX can encompass eBooks as well? - Godmar
[CODE4LIB] Q.: software for vendor title list processing
Hi, at our library, there's an emerging need to process title lists from vendors for various purposes, such as checking that the titles purchased can be discovered via discovery system and/or OPAC. It appears that the formats in which those lists are provided are non-uniform, as is the process of obtaining them. For example, one vendor - let's call them Expedition Scrolls - provides title lists for download to Excel, but which upon closer inspection turn out to be HTML tables. They are encoded using an odd mixture of CP1250 and HTML entities. Other vendors use entirely different formats. My question is whether there are efforts, software, or anything related to streamlining the acquisition and processing of vendor title lists in software systems that aid in the collection development and maintenance process. Any pointers would be appreciated. - Godmar
[CODE4LIB] isoncampus service
A number of web applications, both client and server-side, could benefit if it could be easily determined if a user is on or off campus with respect to accessing resources that use IP-address based authentication. For instance, a web site could show/hide a button asking the user to log in, or a proxied/non-proxied URL could be displayed depending on whether the user is connecting from within/outside an authorized IP range. This would reduce or eliminate the need for special proxy setups/unnecessary proxy use and could improve the user experience. This is probably a problem for which many ad-hoc solutions exist on campuses as well as solutions integrated into vendor-provided systems. It would be nice, and beneficial to in particular LibX, but also presumably other software that is facing this problem, to have a reusable service implementation/response format that is easily deployable and requires only minimum effort for setup and maintenance. Maintenance should be as simple as maintaining a file with the IP-ranges in a directory, like many libraries already do for their communication with database vendors or publishers. My question is what existing ideas/standards/software exists for this purpose, if any, or what ideas/approaches others could share. I would like to point at a small piece of software I'm sharing, which is a PhP-based isoncampus service [1], a demo is available here [2]. If anyone has a similar need and is interested in working together on a solution, this could be a seed around which to start. Besides the easily deployable PhP implementation, more efficient bindings/implementations for other languages and/or server/cloud environment could be created (AppEngine comes to mind.) - Godmar [1] https://github.com/godmar/isoncampus [2] http://libx.lib.vt.edu/services/isoncampus/isoncampus.php ps: as a side-note, OCLC's OpenURL registry used to include IP-ranges as they were known to OCLC; this was at some point removed due to privacy concerns. I do note, however, that in general the ownership of IP-ranges is public information, as are CIDR ranges, both of which are easily accessible via web services provided by arin.net or by the regional registries. Though mapping from an IP address to its owner is not the same as listing IP ranges associated with an organization (many include multiple discontiguous CIDR ranges), I note that some of this information is also public via the BGP-advertised IP-prefixes for an institution's (main-) AS. In any event, no one would be forced to run this service if they have privacy concerns.
Re: [CODE4LIB] WebOPAC/III Z39.50 PHP Query/PHPYAZ
Scraping III systems has got to be one of the most frequently repeated tasks in the history of coding librarianship. Majax2 ([1,2]) is one such service, though (as of right now) it doesn't support search by Call Number. Here's an example ISBN search: http://libx.lib.vt.edu/services/majax2/isbn/0747591059?opacbase=http://catalog.library.miami.edu/search Since you have Summon, you could use their API. Example is here [3,4] - Godmar [1] http://libx.lib.vt.edu/services/majax2/ [2] http://code.google.com/p/majax2/ [3] http://libx.lib.vt.edu/services/summon/test.php [4] http://libx.lib.vt.edu/services/summon/ On Wed, May 9, 2012 at 11:27 AM, Madrigal, Juan A j.madrig...@miami.eduwrote: Hi, I'm looking for a way to send a Call Number to WebOPAC via a query so that I can return data (title, author, etc…) for a specific book in the catalog preferably in JSON or XML (I'll even take text at this point). I'm thinking that one way to accomplish this is via Z39.50 and send a query to the backend that powers WebOPAC Has anyone done something similar to this? PHP YAZ (https://www.indexdata.com/phpyaz) looks promising, but I'd appreciate any guidance. Thanks, Juan Madrigal Web Developer Web and Emerging Technologies University of Miami Richter Library
Re: [CODE4LIB] Anyone using node.js?
On Tue, May 8, 2012 at 11:26 PM, Ed Summers e...@pobox.com wrote: For both these apps the socket.io library for NodeJS provided a really nice abstraction for streaming data from the server to the client using a variety of mechanisms: web sockets, flash socket, long polling, JSONP polling, etc. NodeJS' event driven programming model made it easy to listen to the Twitter stream, or the ~30 IRC channels, while simultaneously holding open socket connections to browsers to push updates to--all from within one process. Doing this sort of thing in a more typical web application stack like Apache or Tomcat can get very expensive where each client connection is a new thread or process--which can lead to lots of memory being used. We've also been using socket.io for our cloudbrowser project, with great success. The only drawback is that websockets don't (yet) support compression, but that's not node.js fault. Another fault: you can't easily migrate open socket.io connections across processes (yet). FWIW, since you mention Rackspace - the lead student on the the cloudbrowser project has now accepted a job at Rackspace (having turned down M$), in part because he finds their technology/environment more exciting. I need to dampen the enthusiasm about memory use a bit. It's true that you're saving memory for additional threads etc., but - depending on your application - you're also paying for that because V8 still lacks some opportunities for sharing other environments have. For instance, if you run 25 Apache instances with say mod_whatever, they'll all share the code via shared .so file. In Java/Tomcat, the JVM exploits, under the hood, similar sharing opportunities. V8/node.js, as of now, does not. This means if you need to load libraries such as jQuery n times, you're paying a substantial price (we found on the order of 1-2MB per instance), because V8 will not do any code sharing under the hood. That said, whether you need to load it multiple times depends on your application - but that's another subtle and error prone issue. If you've done any JavaScript programming in the browser, it will seem familiar, because of the extensive use of callbacks. This can take some getting used to, but it can be a real win in some cases, especially in applications that are more I/O bound than CPU bound. Ryan Dahl (the creator of NodeJS) gave a presentation [4] to a PHP group last year which does a really nice job of describing how NodeJS is different, and why it might be useful for you. If you are new to event driven programming I wouldn't underestimate how much time you might spend feeling like you are turning our brain inside out. The complications arising from event-based programming are an extensively written-about topic of research; one available approach is the use of compilers that provide a linear syntax for asynchronous calls. The TAME system, which originally arose from research at MIT, is one such example. Originally for C++, there's now a version for JavaScript available: http://tamejs.org/ Though I haven't tried it myself, I'm eager to and would also like to know if someone else has. The tamejs.org provides excellent reading for why/how you'd want to do this. - Godmar
Re: [CODE4LIB] Anyone using node.js?
On Tue, May 8, 2012 at 10:17 AM, Ethan Gruber ewg4x...@gmail.com wrote: Thanks. I have been working on a system that allows editing of RDF in web forms, creating linked data connections in the background, publishing to eXist and Solr for dissemination, and will eventually integrate operation with an RDF triplestore/SPARQL, all with Tomcat apps. I'm not sure it is possible to create, manage, and deliver our content with node.js, but I was told by the project manager that Apache, Java, and Tomcat were showing signs of age. I'm not so sure about this considering the prevalence of Tomcat apps both in libraries and industry. I happen to be very fond of Solr, and it seems very risky to start over in node.js, especially since I can't be certain the end product will succeed. I prefer to err on the side of stability. If anyone has other thoughts about the future of Tomcat applications in the library, or more broadly cultural heritage informatics, feel free to jump in. Our data is exclusively XML, so LAMP/Rails aren't really options. We've used node.js (but not Express, their web app framework) to build our own experimental AJAX framework (http://cloudbrowser.cs.vt.edu/ ). We also have extensive experience with Tomcat-based systems. Given that wide, and increasing use of node.js, I'm optimistic that it should be stable and reliable enough for your needs; let me emphasize three points you may want to consider. a) You're programming in JavaScript/CoffeeScript, which is a higher-level language than Java. My students are vastly more productive than in Java. The use of CoffeeScript and require still allows for maintainable code. b) node.js is a single-threaded environment. Reduced potential for some race conditions, but requires an asynchronous programming style. If you've done client-side AJAX, you'll find it familiar; otherwise, you need to adapt. New potential for race conditions. c) Scalability. Each node.js instance runs on a single core; modules exist for clustering on a single machine. I don't know/don't believe session state replication is as well supported as for Tomcat. On the other hand, Tomcat can be a setup nightmare (in my experience). d) Supporting libraries. We've found the surrounding infrastructure excellent. A large community is developing for it http://search.npmjs.org/ . The cool thing is that many client-side libraries work or are easily ported (e.g. moment.js). e) Doing XML in JavaScript. Though JavaScript as a language is intended to be embedded in XML documents, processing XML in JavaScript can be almost as awkward as in Java. JSON is clearly preferred and integrates very naturally. - Godmar
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Mon, Mar 12, 2012 at 3:38 AM, Ed Summers e...@pobox.com wrote: On Fri, Mar 9, 2012 at 12:12 PM, Godmar Back god...@gmail.com wrote: Here's my hand ||*( [1]. ||*) I'm sorry that I was so unhelpful w/ the patches welcome message on your docfix. You're right, it was antagonistic of me to suggest you send a patch for something so simple. Plus, it wasn't even accurate, because I actually wanted a pull request :-) Here's a make-up pull request especially made for you :-) https://github.com/edsu/pymarc/pull/25 - Godmar
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Thu, Mar 8, 2012 at 3:53 PM, Mark A. Matienzo m...@matienzo.org wrote: On Thu, Mar 8, 2012 at 3:32 PM, Godmar Back god...@gmail.com wrote: One side comment here; while smart handling/automatic detection of encodings would be a nice feature to have, it would help if pymarc could operate in an 'agnostic', or 'raw' mode where it would simply preserve the encoding that's there after a record has been read when writing the record. [ Right now, pymarc does not have such a mode - if leader[9] == 'a', the data is unconditionally utf8 encoded on output as per mbklein's patch. ] Please feel free to write a patch and submit a pull request if you're able to contribute code to do this. Mark, while I would be able to contribute code to pymarc, I probably won't (unless my collaborators' needs in respect to pymarc become urgent.) I've been contributing to open source for over 15 years, my first major contribution having been the ext2fs filesystem code in the FreeBSD kernel ( http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/filesystems-linux.html ) and I'm a bit confused by how the spirit in the community has changed. The phrase patches welcome used to be reserved for when there was a feature request somebody wanted, but you (the owner/maintainer of the software) didn't have the time or considered the problem not important. Back then, it used to be that all suggestions were welcome. For instance, if a user pointed out a typo, you'd fix it. Similarly, if a user or fellow developer pointed out a potential design flaw, you'd understand that you don't ask for patches, but that you go back to the drawing board and think about your software's design. In pymarc's case, what's needed is not more code (it already has a moderately confusing set of almost a dozen switches for reading/writing), but a requirement analysis where you think about use cases you want to support. For instance, whether you want to support reading/writing real world records in batches (without touching them) even if they have flaws or not. And/Or whether you insist on interpreting a record's data in terms of encoding, always. That's something occasional contributors cannot do, it requires work by the core team, in discussion with frequent users. (I would have liked to take this discussion to a pymarc-users list, but didn't find any.) - Godmar
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Fri, Mar 9, 2012 at 10:37 AM, Michael B. Klein mbkl...@gmail.com wrote: The internal discussion then becomes, I have a need, and I've written something that satisfies it. I think it could also be useful to others, but I'm not going to have time to make major changes or implement features others need. Should I open source this or keep it to myself? Does freeing my code come with an implicit requirement to maintain and support it? Should it? It used to be that way, at least it was this way when I grew up in open source (in the 90s, before Eric Raymond invented the term). And it makes sense, for successful projects that have at least a moderate number of users. Just dumping your code on github helps very few people. I'd vote open source just about every time. If someone sees the need and has the time to do a functional/requirements analysis and develop a core team around pymarc, more power to them. The code that's already there will give them a head start. Or they can start from scratch. Until then, it will remain a fork-patch-and-pull, community-supported project. It's not just an agreement on design goals the core team must reach, it's also the issue of maintaining a record (in email discussions/posts and in the developer's minds) of what issues arose, what legacy decisions were made, where backwards compatibility is required. That's something maintainers do, it enables them to reason about future design decisions. People who feel a sense of ownership and mental investment. Sure, I could throw in a flag 'dont_utf8_encode' to make the code work for my case. But it wouldn't improve the software. (In pymarc's case, I'd also recommend a discussion about data structures. For instance, what should the type of the elements of the subfield array be that's passed to a Field constructor? 8-bit string or unicode objects? The thread you link to shows ambiguity here.) Staying with fork-patch-and-pull may help individual people meet their individual needs, but can prevent wide-spread adoption - and creates frustration for users who may lack the expertise to track down encoding errors or who are even unable to understand where the code they're using lives on their machine. Once a piece of software has reached the stage where it's distributed as a package (which pymarc, I believe, is), the distributors have taken on a piece of responsibility. Related, being unwilling to fix even documentation typos unless someone clones the repository and delivers a pull request (on a silver platter?) seems unusual to me, but - perhaps I'm just too old and culturally out of tune with today's open source movement. (I'm not being ironic here, maybe there has been a shift and I should just get with it.) - Godmar
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Fri, Mar 9, 2012 at 11:48 AM, Jon Gorman jonathan.gor...@gmail.comwrote: Can't we all just shake hands virtually or something? Here's my hand ||*( [1]. I overreacted, for which I'm sorry. (Also, I didn't see the entire github conversation until I just now visited the website, the github email notification seems selective and only sent me Ed's replies (?) in my emailbox.) - Godmar [1] http://www.kadifeli.com/fedon/smiley.htm
[CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
Hi, a few days ago, I showed pymarc to a group of technical librarians to demonstrate how easily certain tasks can be scripted/automated. Unfortunately, it blew up at me when I tried to write a record: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9: ordinal not in range(128) Investigation revealed this culprit: =LDR 00916nam a2200241I 4500 =001 ocm10685946 =005 19880203211447.0 =007 cr\bn||abp =007 cr\bn||cda =008 840503s1939gw00010\ger\d =040 \\$aMBB$cMBB$dCRL =049 \\$aCRLL =100 10$aEsser, Hermann,$d1900- =245 14$aDie jE8udischer Weltpest ;$bjudendE1ammerung auf dem Erdball,$cvon Hermann Esser. =260 0\$aME8unchen,$bZentralverlag der N S D A P., F. Eher ahchf.,$c1939. =300 \\$a243 [1] p.$c23 cm. =533 \\$aAlso available as electronic reproduction.$bChicago :$cCenter for Research Libraries,$d[2009] =650 \0$aJewish question. =700 12$aBierbrauer, Johann Jacob,$d1705-1760? =710 2\$aCenter for Research Libraries (U.S.) =856 41$uhttp://dds.crl.edu/CRLdelivery.asp?tid=10538$zOnline version =907 \\$a.b28931622$b08-30-10$c08-30-10 =998 \\$awww$b08-30-10$cm$dz$e-$fger$ggw $h4$i0 The leader[9] field is set to 'a', so the record should contain UTF8-encoded Unicode [1], but E8 75 in the 245$a appears to be ANSEL where 'E8' denotes the Umlaut preceding the lowercase 'u' (0x75). [2] To me, this record looks misencoded... am I correct here? There are thousands of such records in the data set I'm dealing with, which was obtained using the 'Data Exchange' feature of III's Millennium system. My question is how others, especially pymarc users dealing with III records, deal with this issue or whatever other experiences/hints/practices/kludges exist in this area. Thanks. - Godmar [1] http://www.loc.gov/marc/bibliographic/bdleader.html [2] http://lcweb2.loc.gov/diglib/codetables/45.html
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Thu, Mar 8, 2012 at 1:46 PM, Terray, James james.ter...@yale.edu wrote: Hi Godmar, UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9: ordinal not in range(128) Having seen my fair share of these kinds of encoding errors in Python, I can speculate (without seeing the pymarc source code, so please don't hold me to this) that it's the Python code that's not set up to handle the UTF-8 strings from your data source. In fact, the error indicates it's using the default 'ascii' codec rather than 'utf-8'. If it said 'utf-8' codec can't decode..., then I'd suspect a problem with the data. If you were to send the full traceback (all the gobbledy-gook that Python spews when it encounters an error) and the version of pymarc you're using to the program's author(s), they may be able to help you out further. My question is less about the Python error, which I understand, than about the MARC record causing the error and about how others deal with this issue (if it's a common issue, which I do not know.) But, here's the long story from pymarc's perspective. The record has leader[9] == 'a', but really, truly contains ANSEL-encoded data. When reading the record with a MARCReader(to_unicode = False) instance, the record reads ok since no decoding is attempted, but attempts at writing the record fail with the above error since pymarc attempts to utf8 encode the ANSEL-encoded string which contains non-ascii chars such as 0xe8 (the ANSEL Umlaut prefix). It does so because leader[9] == 'a' (see [1]). When reading the record with a MARCReader(to_unicode=True) instance, it'll throw an exception during marc_decode when trying to utf8-decode the ANSEL-encoded string. Rightly so. I don't blame pymarc for this behavior; to me, the record looks wrong. - Godmar (ps: that said, what pymarc does fails in different circumstances - from what I can see, pymarc shouldn't assume that it's ok to utf8-encode the field data if leader[9] is 'a'. For instance, this would double-encode correctly encoded Marc/Unicode records that were read with a MARCReader(to_unicode=False) instance. But that's a separate issue that is not my immediate concern. pymarc should probably remember if a record needs or does not need encoding when writing it, rather than consulting the leader[9] field.) (*) https://github.com/mbklein/pymarc/commit/ff312861096ecaa527d210836dbef904c24baee6
Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records
On Thu, Mar 8, 2012 at 3:18 PM, Ed Summers e...@pobox.com wrote: Hi Terry, On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry terry.re...@oregonstate.edu wrote: This is one of the reasons you really can't trust the information found in position 9. This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not. Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process. Determining the something else is much more difficult, but generally not necessary. Can you describe in a bit more detail how MARCEdit sniffs the record to determine the encoding? This has come up enough times w/ pymarc to make it worth implementing. One side comment here; while smart handling/automatic detection of encodings would be a nice feature to have, it would help if pymarc could operate in an 'agnostic', or 'raw' mode where it would simply preserve the encoding that's there after a record has been read when writing the record. [ Right now, pymarc does not have such a mode - if leader[9] == 'a', the data is unconditionally utf8 encoded on output as per mbklein's patch. ] - Godmar
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens o...@ostephens.com wrote: On 26 Feb 2012, at 14:42, Godmar Back wrote: May I ask a side question and make a side observation regarding the harvesting of full text of the object to which a OAI-PMH record refers? In general, is the idea to use the dc:source/text() element, treat it as a URL, and then expect to find the object there (provided that there was a suitable dc:type and dc:format element)? I think dc:identifier is usually used to provide a URL for the item being described. The examples at http://www.openarchives.org/OAI/openarchivesprotocol.html#dublincorefollow this, and the UK E-Thesis schema ( http://naca.central.cranfield.ac.uk/ethos-oai/2.0/oai-uketd.xml) does as well. Thanks. FWIW, the identifier contains the same URL as the source field in my example; but your interpretation of the identifier matches that found in the OAI-PMH spec at http://www.openarchives.org/OAI/openarchivesprotocol.html#UniqueIdentifier where it also points out that it may not necessarily be a URL, could be any URN or even a DOI as long as it relates the metadata to the underlying item. This issue is certainly not unique to VT - we've come across this as part of our project. I note that this means that providing the service point URL for the ETD OAI-PMH server is not sufficient to facilitate full-text harvesting/indexing by a provider such as Summon. (And sure enough, they've indexed only the metadata.) They would have to/will have to employ additional effort. Re: your points about the right to full-text index. If indeed you're right that full-text indexing is a fair use (is it? Eric Hellmann seems to indicate so: http://go-to-hellman.blogspot.com/2010/02/copyright-safe-full-text-indexing-of.html as long as the technical definition of making a copy is met.) - if that's indeed so, then of course the intentions of the author don't matter, at least in the US legal system. Otherwise, my point would have been that I'd like to see the signed ETD agreement forms extended to explicitly include the author's permission for full-text indexing. - Godmar
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann metadata.ma...@gmail.comwrote: On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens o...@ostephens.com wrote: This issue is certainly not unique to VT - we've come across this as part of our project. While the OAI-PMH record may point at the PDF, it can also point to a intermediary page. This seems to be standard practice in some instances - I think because there is a desire, or even requirement, that a user should see the intermediary page (which may contain rights information etc.) before viewing the full-text item. There may also be an issue where multiple files exist for the same item - maybe several data files and a pdf of the thesis attached to the same metadata record - as the metadata via OAI-PMH may not describe each asset. This has been an issue since the early days of OAI-PMH, and many large providers provide such intermediate pages (arxiv.org, for instance). The other issue driving providers towards intermediate pages is that it allows them to continue to derive statistics from usage of their materials, which direct access URIs and multiple web caches don't. For providers dependent on external funding, this is a biggie. Why do you place direct access URI and multiple web caches into the same category? I follow your argument re: usage statistics for web caches, but as long as the item remains hosted in the repository direct access URIs should still be counted (provided proper cache-control headers are sent.) Perhaps it would require server-side statistics rather than client-based GA. Also, it seems to me that except for Google full-text indexing engines don't necessarily want to be come providers of cached copies (certainly the discovery systems currently provided commercially don't AFAIK.) - Godmar
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
May I ask a side question and make a side observation regarding the harvesting of full text of the object to which a OAI-PMH record refers? In general, is the idea to use the dc:source/text() element, treat it as a URL, and then expect to find the object there (provided that there was a suitable dc:type and dc:format element)? Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the harvesting of ETD metadata. Yet, its metadata reads: ListRecords metadata dc typetext/type formatapplication/pdf/format source http://scholar.lib.vt.edu/theses/available/etd-3345131939761081//source When one visits http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however there is no 'text' document of type 'application/pdf' - rather, it's an HTML title page that embeds links to one or more PDF documents, such as http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto Walker_5.pdf. Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up to allow the harvesting of full-text without what would basically amount to crawling the ETD title page, or other repository-specific mechanisms? On a related note, regarding rights. As a faculty member, I regularly sign ETD approval forms. At Tech, students have three options to choose from: (a) open and immediate access, (b) restricted to VT for 1 year, (c) withhold access completely for 1 year for patent/security purposes. The current form does not allow student authors to address whether the full-text of their dissertation may be harvested for the purposes of full-text indexing in such indexes as Google or Summon, not does it allow them to restrict where copies are served from. Similarly, the dc:rights section in the OAI-PMH records address copyright only. In practice, Google crawls, indexes, and serves full-text copies of our dissertations. - Godmar
Re: [CODE4LIB] Voting for c4l 2012 talks ends today
This site shows: Ruby (Rack) application could not be started On Fri, Dec 9, 2011 at 11:50 AM, Anjanette Young youn...@u.washington.eduwrote: Get your votes in before 5pm (PST) http://vote.code4lib.org/election/21 -- You will need your code4lib.orglogin in order to vote. If you do not have one you can create one at http://code4lib.org/
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Tue, Dec 6, 2011 at 3:40 PM, Doran, Michael D do...@uta.edu wrote: Current trends certainly go in the opposite direction, look at jQuery Mobile. I agree that jQuery Mobile is very popular now. However, that in no way negates the caution. One could consider it as a tragedy of the commons in which a user's iPhone battery is the shared resource. Why should I as a developer (rationally consulting my own self-interest) conserve battery power that doesn't belong to me, just so some other developer's app can use that resource? I'm just playing the devil's advocate here. ;-) You're taking it as given that the use of JavaScript on a mobile device is significantly less energy-efficient than an approach that would exercise only the HTML parsing path. Be careful here, intuition can be misleading. Devices cannot send HTML to their displays. It takes energy to parse it, and energy to render it. Time is roughly proportional to energy. Where do you think most time/energy is spent in? (page-provided) JavaScript execution, HTML parsing, or page layout/rendering? Based on the information I have available to me (I'd appreciate pointers to other studies), JS execution does not dominate - it ranks last behind Page layout and rendering [1], even for sites that are JS heavy, such as webmail sites. Interestingly, a large part of that is evaluating CSS selectors. On a related note, let me point out that there are many ways to change the DOM on the client. Client-side templating frameworks such as knockout.js or jQuery tmpl produce HTML (which then must be parsed), but modern AJAX frameworks such as ZK don't produce any HTML at all, skipping parsing altogether. I meant to add another reason why at this point teaching newbies an AJAX style that relies on HTML-returning entry points is a really bad idea, and that is the move from read-only applications (like Nate's) to applications that actually update state on the server. In this case, multiple parts of the client page (perhaps a label here, a link there) need to be updated. Expressing this in HTML is cumbersome, to say the least. (As an aside, I note that AJAX frameworks such as ZK, which pursued the HTML approach in their first iterations, have moved away from it. Compare the client/server traffic on a ZK 3.x application to the one in a ZK 5. app to see this.) For those interested in how to use one of possible client-side approaches I'm suggesting, I prototyped Nate's application using only client-side templating: http://libx.lib.vt.edu/services/popsubjects/cs/ It uses knockout.js's data binding facilities as well as (due to qTip 1.0's design) the jQuery tmpl engine. Read the (small, self-contained) source to learn about the server-side entry points. (I should point out that in this case, the need for the book cover ISBNs to be retrieved remotely is somewhat contrived; they should probably be sent along with the page in the first place.) A side effect of this JSON-oriented design is that it results in 2 nice JSON-P web services that can be embedded/used in other pages/applications. - Godmar [1] http://www.eecs.berkeley.edu/~lmeyerov/projects/pbrowser/pubfiles/login.pdf
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Tue, Dec 6, 2011 at 8:38 AM, Erik Hatcher erikhatc...@mac.com wrote: I'm with jrock on this one. But maybe I'm a luddite that didn't get the memo either (but I am credited for being one of the instrumental folks in the Ajax world, heh - in one or more of the Ajax books out there, us old timers called it remote scripting). On the in-jest rhetorical front, I'm wondering if referring to oneself as oldtimer helps in defending against insinuations that opposing technological change makes one a defender of the old ;-) But: What I hate hate hate about seeing JSON being returned from a server for the browser to generate the view is stuff like: string = div + some_data_from_JSON + /div; That embodies everything that is wrong about Ajax + JSON. That's exactly why you use new libraries such as knockout.js, to avoid just that. Client-side template engines with automatic data-bindings. Alternatively, AJAX frameworks use JSON and then interpret the returned objects as code. Take a look at the client/server traffic produced by ZK, for instance. As Jonathan said, the server is already generating dynamic HTML... why have it return It isn't. There is no already generating anything server, it's a new app Nate is writing. (Unless you count his work of the past two days). The dynamic HTML he's generating is heavily tailored to his JS. There's extremely tight coupling, which now exists across multiple files written in multiple languages. Simply avoidable bad software engineering. That's not even making the computational cost argument that avoiding template processing on the server is cheaper. And with respect to Jonathan's argument of degradation, a degraded version of his app (presumably) would use table - or something like that, it'd look nothing like what's he showed us yesterday. Heh - the proof of the pudding is in the eating. Why don't we create 2 versions of Nate's app, one with mixed server/client - like the one he's completing now, and I create the client-side based one, and then we compare side by side? I'll work with Nate on that. - Godmar [ I hope it's ok to snip off the rest of the email trail in my reply. ]
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Tue, Dec 6, 2011 at 11:18 AM, Nate Hill nathanielh...@gmail.com wrote: I attached the app as it stands now. There's something wrong w/ the regex matching in catscrape.php so only some of the images are coming through. No, it's not the regexp. You're simply scraping syndetics links, without checking if syndetics has or does not have an image for those ISBNs. Those searches where the first four hits have jackets display, the others don't. Also: should I be sweating the fact that basically every time someone mouses over one of these boxes they are hitting our library catalog with a query? It struck me that this might be unwise. But I don't know either way. Yes, it's unwise, especially since the results won't change (much). - Godmar
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Tue, Dec 6, 2011 at 11:22 AM, Doran, Michael D do...@uta.edu wrote: You had earlier asked the question whether to do things client or server side - well in this example, the correct answer is to do it client-side. (Yours is a read-only application, where none of the advantages of server-side processing applies.) One thing to take into consideration when weighing the advantages of server-side vs. client-side processing, is whether the web app is likely to be used on mobile devices. Douglas Crockford, speaking about the fact that JavaScript has become the de fact universal runtime, cautions: Which I think puts even more pressure on getting JavaScript to go fast. Particularly as we're now going into mobile. Moore's Law doesn't apply to batteries. So how much time we're wasting interpreting stuff really matters there. The cycles count.[1] Personally, I don't know enough to know how significant the impact would be. However, I understand Douglas Crockford knows a little something about JavaScript and JSON. It's certainly true that limited energy motivates the need to minimize client processing, but the conclusion that this then means server generation of static HTML is not clear. Current trends certainly go in the opposite direction, look at jQuery Mobile. - Godmar
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Tue, Dec 6, 2011 at 1:57 PM, Jonathan Rochkind rochk...@jhu.edu wrote: On 12/6/2011 1:42 PM, Godmar Back wrote: Current trends certainly go in the opposite direction, look at jQuery Mobile. Hmm, JQuery mobile still operates on valid and functional HTML delivered by the server. In fact, one of the designs of JQuery mobile is indeed to degrade to a non-JS version in feature phones (you know, eg, flip phones with a web browser but probably no javascript). The non-JS version it degrades to is the same HTML that was delivered to the browser in either way, just not enhanced by JQuery Mobile. My argument was that current platforms, such as jQuery mobile, heavily rely on JavaScript on the very platforms on which Crockford statement points out it would be wise to save energy. Look at the jQuery Mobile documentation, A-grade platforms: http://jquerymobile.com/demos/1.0/docs/about/platforms.html If I were writing AJAX requests for an application targetted mainly at JQuery Mobile... I'd be likely to still have the server delivery HTML to the AJAX request, then have js insert it into the page and trigger JQuery Mobile enhancements on it. I wouldn't. Return JSON and interpret or template the result. - Godmar
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
FWIW, I would not send HTML back to the client in an AJAX request - that style of AJAX fell out of favor years ago. Send back JSON instead and keep the view logic client-side. Consider using a library such as knockout.js. Instead of your current (difficult to maintain) mix of PhP and client-side JavaScript, you'll end up with a static HTML page, a couple of clean JSON services (for checked-out per subject, and one for the syndetics ids of the first 4 covers), and clean HTML templates. You had earlier asked the question whether to do things client or server side - well in this example, the correct answer is to do it client-side. (Yours is a read-only application, where none of the advantages of server-side processing applies.) - Godmar On Mon, Dec 5, 2011 at 6:18 PM, Nate Hill nathanielh...@gmail.com wrote: Something quite like that, my friend! Cheers N On Mon, Dec 5, 2011 at 3:10 PM, Walker, David dwal...@calstate.edu wrote: I gotcha. More information is, indeed, better. ;-) So, on the PHP side, you just need to grab the term from the query string, like this: $searchterm = $_GET['query']; And then in your JavaScript code, you'll send an AJAX request, like: http://www.natehill.net/vizstuff/catscrape.php?query=Cooking Is that what you're looking for? --Dave - David Walker Library Web Services Manager California State University -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate Hill Sent: Monday, December 05, 2011 3:00 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] jQuery Ajax request to update a PHP variable As always, I provided too little information. Dave, it's much more involved than that I'm trying to make a kind of visual browser of popular materials from one of our branches from a .csv file. In order to display book covers for a series of searches by keyword, I query the catalog, scrape out only the syndetics images, and then display 4 of them. The problem is that I've hardcoded in a search for 'Drawing', rather than dynamically pulling the correct term and putting it into the catalog query. Here's the work in process, and I believe it will only work in Chrome right now. http://www.natehill.net/vizstuff/donerightclasses.php I may have a solution, Jason's idea got me part way there. I looked all over the place for that little snippet he sent over! Thanks! On Mon, Dec 5, 2011 at 2:44 PM, Walker, David dwal...@calstate.edu wrote: And I want to update 'Drawing' to be 'Cooking' w/ a jQuery hover effect on the client side then I need to make an Ajax request, correct? What you probably want to do here, Nate, is simply output the PHP variable in your HTML response, like this: h1 id=foo?php echo $searchterm ?/h1 And then in your JavaScript code, you can manipulate the text through the DOM like this: $('#foo').html('Cooking'); --Dave - David Walker Library Web Services Manager California State University -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate Hill Sent: Monday, December 05, 2011 2:09 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] jQuery Ajax request to update a PHP variable If I have in my PHP script a variable... $searchterm = 'Drawing'; And I want to update 'Drawing' to be 'Cooking' w/ a jQuery hover effect on the client side then I need to make an Ajax request, correct? What I can't figure out is what that is supposed to look like... something like... $.ajax({ type: POST, url: myfile.php, data: ...not sure how to write what goes here to make it 'Cooking'... }); Any ideas? -- Nate Hill nathanielh...@gmail.com http://www.natehill.net -- Nate Hill nathanielh...@gmail.com http://www.natehill.net -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
On Mon, Dec 5, 2011 at 6:45 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I still like sending HTML back from my server. I guess I never got the message that that was out of style, heh. I suppose there are always some stalwart defenders of the status quo ;-) More seriously, I think I'd like to defend my statement. The purpose of graceful degradation is well-acknowledged - I don't think no-JS browsers are much of a concern, but web spiders are and so are probably ADA accessibility requirements, as well as low-bandwidth environments. I do not believe, however, that such situation warrant any sharing of HTML templates. If they do, it means your app is, well, perhaps outdated in that it doesn't make full use of today's JS features. Certainly Gmail's basic html version for low bandwidth environments shares no HTML templates with the JS main app. In Nate's case, which is a heavily JS-dependent app (he uses various jQuery plug-ins to drive his layout, as well as qtip for tooltips), I find it difficult to see how any degraded environment would share any HTML with his app. That said, I'm genuinely interested in what others are thinking/have experienced. Also, for expository purposes, I'd love to prototype the client-side for Nate's app. Then we could compare the mixed PhP server/client-side AJAX version with the pure JS app I'm suggesting. - Godmar On Mon, Dec 5, 2011 at 6:45 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I still like sending HTML back from my server. I guess I never got the message that that was out of style, heh. My server application already has logic for creating HTML from templates, and quite possibly already creates this exact same piece of HTML in some other place, possibly for use with non-AJAX fallbacks, or some other context where that snippet of HTML needs to be rendered. I prefer to re-use this logic that's already on the server, rather than have a duplicate HTML generating/templating system in the javascript too. It's working fine for me, in my use patterns. Now, certainly, if you could eliminate any PHP generation of HTML at all, as I think Godmar is suggesting, and basically have a pure Javascript app -- that would be another approach that avoids duplication of HTML generating logic in both JS and PHP. That sounds fine too. But I'm still writing apps that degrade if you have no JS (including for web spiders that have no JS, for instance), and have nice REST-ish URLs, etc. If that's not a requirement and you can go all JS, then sure. But I wouldn't say that making apps that use progressive enhancement with regard to JS and degrade fine if you don't have is out of style, or if it is, it ought not to be! Jonathan
Re: [CODE4LIB] Examples of Web Service APIs in Academic Public Libraries
On Sat, Oct 8, 2011 at 1:40 PM, Patrick Berry pbe...@gmail.com wrote: We're (CSU, Chico) using http://code.google.com/p/googlebooks/ to provide easy access to partial and full text books. Good to hear. As an aside, we wrote up some background on how to use widgets and webservices in a 2010 article published in LITA's ITAL magazine: http://www.lita.org/ala/mgrps/divs/lita/publications/ital/29/2/back.pdf - Godmar On Sat, Oct 8, 2011 at 10:33 AM, Michel, Jason Paul miche...@muohio.edu wrote: Hello all, I'm a lurker on this listserv and am interested in gaining some insight into your experiences of utilizing web service APIs in either an academic library or public library setting. I'm writing a book for ALA Editions on the use of Web Service APIs in libraries. Each chapter covers a specific API by delineating the technicalities of the API, discussing potential uses of the API in library settings, and step-by-step tutorials. I'm already including examples of how my library (Miami University in Oxford, Ohio) are utilizing these APIs but would like to give the reader more examples from a variety of settings. APIs covered in the book: Flickr, Vimeo, Google Charts, Twitter, Open Library, LibraryThing, Goodreads, OCLC. So, what are you folks doing with APIs? Thanks for any insight! Kind regards, Jason -- Jason Paul Michel User Experience Librarian Miami University Libraries Oxford, Ohio 45044 twitter:jpmichel
Re: [CODE4LIB] ny times best seller api
On Wed, Sep 28, 2011 at 5:02 PM, Michael B. Klein mbkl...@gmail.com wrote: It's not NYTimes.com's fault; it's the cross-site scripting jerks who made the security necessary in the first place. NYTimes could allow JSONP, but then developers would need to embed their API key in their web pages, which means the API key would simply be a token used for statistics, rather than for authentication. It's their choice that they don't allow that. Closer to the code4lib community: OCLC and Serials Solutions don't support JSONP in their webservices, either, even though doing so would allow cool services and would likely not affect their business models adversely in a significant way, IMO. We should keep lobbying them to remove these restrictions, as I've been doing for a while. - Godmar
Re: [CODE4LIB] ny times best seller api
Are you trying to run this inside a webpage served from a domain other than nytimes.com? If so, you'd need to use JSONP, which a cursory examination of their API documentation reveals they do not support. So, you need to use a proxy. Here's one: $ cat hardcover.php ? $cb = @$_GET['callback']; $json = file_get_contents(' http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?api-key=' ); header(Content-Type: text/javascript); echo $cb . '(' . $json . ')'; ? Install it on your webserver, then change your JavaScript code to refer to it using callback=?. For instance, if you installed it on http://libx.lib.vt.edu/services/nytimes/hardcover.php then you would be using the URL http://libx.lib.vt.edu/services/nytimes/hardcover.php?callback=? (.getJSON will replace the ? with a suitably generated function name). - Godmar On Wed, Sep 28, 2011 at 3:28 PM, Nate Hill nathanielh...@gmail.com wrote: Anybody out there using the NY times best seller API to do stuff on their library websites? I can't figure out what's wrong with my code here. Data is returned as null; I can't seem to parse the response with jQuery. Any help would be supercool. I removed the API key - my code doesn't actually contain ''. Here's the jQuery: jQuery(document).ready(function(){ $(function(){ //json request to new york times $.getJSON(' http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?api-key= ', function(data) { //loop through the results with the following function $.each(data.results.book_details, function(i,item){ //turn the title into a variable var bookTitle = item.title; $('#container').append('p'+bookTitle+'/p'); }); }); }); }); Here's a snippet of the JSON response: { status: OK, copyright: Copyright (c) 2011 The New York Times Company. All Rights Reserved., num_results: 35, last_modified: 2011-09-23T12:00:29-04:00, results: [{ list_name: Hardcover Fiction, display_name: Hardcover Fiction, updated: WEEKLY, bestsellers_date: 2011-09-17, published_date: 2011-10-02, rank: 1, rank_last_week: 0, weeks_on_list: 1, asterisk: 0, dagger: 0, isbns: [{ isbn10: 0399157786, isbn13: 9780399157783 }], book_details: [{ title: NEW YORK TO DALLAS, description: An escaped child molester pursues Lt. Eve Dallas; by Nora Roberts, writing pseudonymously., contributor: by J. D. Robb, author: J D Robb, contributor_note: , price: 27.95, age_group: , publisher: Putnam, primary_isbn13: 9780399157783, primary_isbn10: 0399157786 }], reviews: [{ book_review_link: , first_chapter_link: , sunday_review_link: , article_chapter_link: }] -- Nate Hill nathanielh...@gmail.com http://www.natehill.net
Re: [CODE4LIB] internet explorer and pdf files
On Wed, Aug 31, 2011 at 8:42 AM, Eric Lease Morgan emor...@nd.edu wrote: Eric wrote: Unfortunately IE's behavior is weird. The first time someone tries to load one of these URL nothing happens. When someone tries to load another one, it loads just fine. When they re-try the first one, it loads. We are banging our heads against the wall here at Catholic Pamphlet Central. Networking issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus issue? Thank you for all the replies. We'er not one hundred percent positive, but we think the issue with IE has something to do with headers. As alluded to previously, IE needs/desires file name extensions in order to know what to do with incoming files. We are serving these PDF documents from Fedora which is sending out a stream, not necessarily a file. Apparently this confuses IE. Since Fedora is not really designed to be a file server, we will write a piece of intermediary software to act as a go between. This isn't really a big deal since all of our other implementations of Fedora are expected to work in the same way. Wish us luck. FWIW, this is true for any and all HTTP servers. Only the client's request specifies a name (as the path component of the request, e.g., /fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1 The server's reply does not contain a name at all. It simply specifies what type and, typically, the length of the returned content is. The returned content itself is just a blob of bytes. Your server says this blob of bytes is a PDF object (application/pdf), but it doesn't specify the length. Not specifying the length makes the job of the client slightly more difficult, which is why the HTTP/1.1 specification discourages it; it now has to read the stream until the server closes the connection. It is certainly possible that IE's PDF plug-in is not prepared to deal with this situation; and I would certainly fix this first. - Godmar
Re: [CODE4LIB] internet explorer and pdf files
Earlier versions of IE were known to sometimes disregard the Content-Type (which you set correctly to application/pdf) and look at the suffix of the URL instead. For instance, they would render HTML if you served a .html as text/plain, etc. You may try creating URLs that end with .pdf Separately, you're not sending a Content-Length header: HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Pragma: No-cache Cache-Control: no-cache Expires: Wed, 31 Dec 1969 19:00:00 EST Content-Type: application/pdf Date: Mon, 29 Aug 2011 19:47:27 GMT Connection: close Length: unspecified [application/pdf] which disregards RFC 2616, http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13 On Mon, Aug 29, 2011 at 3:30 PM, Eric Lease Morgan emor...@nd.edu wrote: I need some technical support when it comes to Internet Explorer (IE) and PDF files. Here at Notre Dame we have deposited a number of PDF files in a Fedora repository. Some of these PDF files are available at the following URLs: * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1 Retrieving the URLs with any browser other than IE works just fine. Unfortunately IE's behavior is weird. The first time someone tries to load one of these URL nothing happens. When someone tries to load another one, it loads just fine. When they re-try the first one, it loads. We are banging our heads against the wall here at Catholic Pamphlet Central. Networking issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus issue? Could some of y'all try to load some of the URLs with IE and tell me your experience? Other suggestions would be greatly appreciated as well. -- Eric Lease Morgan University of Notre Dame (574) 631-8604
Re: [CODE4LIB] dealing with Summon
On Tue, Mar 1, 2011 at 11:14 PM, Roy Tennant roytenn...@gmail.com wrote: On Tue, Mar 1, 2011 at 2:14 PM, Godmar Back god...@gmail.com wrote: Similarly, the date associated with a record can come in a variety of formats. Some are single-field (20080901), some are abbreviated (200811), some are separated into year, month, date, etc. Some records have a mixture of those. In this world of MARC (s/MARC/hurt) I call that an embarrassment of riches. I've spent some bit of time parsing MARC, especially lately, and just the fact that Summon provides a normalized date element is HUGE. That's great to hear - but how do I know which elements to use? For instance, look at the JSON excerpt at http://api.summon.serialssolutions.com/help/api/search/response/documents PublicationDateCentury:[ 1900 ], PublicationDateDecade:[ 1970 ], PublicationDateYear:[ 1979 ], PublicationDate:[ 1979. ], PublicationDate_xml:[ { day:01, month:01, text:1979., year:1979 } ], Which one is the cleaned up date, and in which order shall I be looking for the date field in the record when some or all of this information is missing in a particular record? Andrew responded to that if given, PublicationDate_xml is the preferred one - but this raises the question which field in PublicationDate_xml to use: .text, .day, or .year? What if some are missing? What if PublicationDate_xml is missing, then I use or look for PublicationDate? Or is PublicationDateYear/Month/Decade preferred to PublicationDate? Which fields are derived from which others? These are the types of questions I'm looking to answer. - Godmar
Re: [CODE4LIB] dealing with Summon
On Wed, Mar 2, 2011 at 11:12 AM, Roy Tennant roytenn...@gmail.com wrote: Godmar, I'm surprised you're asking this. Most of the questions you want answered could be answered by a basic programming construct: an if-then-else statement and a simple decision about what you want to use in your specific application (for example, do you prefer text with the period, or not?). About the only question that such a solution wouldn't deal with is which fields are derived from which others, which strikes me as superfluous to your application if you know a hierarchy of preference. But perhaps I'm missing something here. I'm not asking how to code it, I'm asking for the algorithm I should use, given the fact that I'm not familiar with the provenance and status of the data Summon returns (which, I understand, is a mixture of original, harvested data, and cleaned-up, processed data.) Can you suggest such an algorithm, given the fact that each of the 8 elements I showed in the example (PublicationDateYear, PublicationDateDecade, PublicationDate, PublicationDateCentury, PublicationDate_xml.text, PublicationDate_xml.day, PublicationDate_xml.month, PublicationDate_xml.year is optional? But wait I think I've also seen records where there is a PublicationDateMonth, and records where some values have arrays of length 1. Can you suggest, or at least outline, such an algorithm? It would be helpful to know, for instance, if the presence of a PublicationDate_xml field supplants any other PublicationDate* fields (does it?) If a PublicationDate_xml field is absent, which field would I want to look at next? Is PublicationDate more reliable than a combination of PublicationDateYear and PublicationDateMonth (and perhaps PublicationDateDay if it exists?)? If the PublicationDate_xml is present, then: should I prefer the .text option? What's the significance of that dot? Is it spurious, like the identifier you mentioned you find in raw MARC records? If not, what, if anything, is known about the presence of the other fields? What if multiple fields are given in an array? Is the ordering significant (e.g., the first one is more trustworthy?) Or should I sort them based on a heuristics? (e.g., if 20100523 and 201005 is given, prefer the former?) What if the data is contradictory? These are the questions I'm seeking answers to; I know that those of you who have coded their own Summon front-ends must have faced the same questions when implementing their record displays. - Godmar
Re: [CODE4LIB] dealing with Summon
On Wed, Mar 2, 2011 at 11:36 AM, Walker, David dwal...@calstate.edu wrote: Just out of curiosity, is there a Summon (API) developer listserv? Should there be? Yes, there is - I'm waiting for my subscription there to be approved. Like I said at the beginning of this thread, this is only tangentially a Code4Lib issue, and certainly the details aren't. But perhaps the general problem is (?) - Godmar
Re: [CODE4LIB] dealing with Summon
On Wed, Mar 2, 2011 at 11:54 AM, Demian Katz demian.k...@villanova.edu wrote: These are the questions I'm seeking answers to; I know that those of you who have coded their own Summon front-ends must have faced the same questions when implementing their record displays. Feel free to refer to VuFind's Summon template for reference if that is helpful: https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/interface/themes/default/Summon/record.tpl Andrew wrote this originally, and I've tweaked it in a few places to address problems as they arose. I don't claim that this offers the definitive answer to your questions... but it's working reasonably well for us so far. Ah, thanks. As they say, a piece of code speaks a thousand words! So, to solve the conundrum: only PublicationDate_xml and PublicationDate are of interest. If the former is given, use it and print (if available) its .month, .day, and .year fields. Else, if the latter is given, just print it. Ignore all other date-related fields. Ignore PublicationDate_xml.text. Ignore if there's more than one date field - use the first one. This knowledge will also help me avoid sending unnecessary data to the LibX client. As you know, Summon requires a proxy that talks to the actual service, and cutting out redundant and derived fields at the proxy could save a fair amount of bandwidth (though I'll have to check if it also shaves off latency.) A typical search response (raw JSON, with 20 hits) is 500KB long, so investing computing time at the proxy in cutting this down may be promising. - Godmar
Re: [CODE4LIB] C4L2011 Voting for Prepared Talks
through Dec 1 typically means until Dec 1, 23:59pm (in some time zone) - yet the page says voting is closed. Could this be fixed? - Godmar On Mon, Nov 29, 2010 at 5:02 PM, McDonald, Robert H. rhmcd...@indiana.eduwrote: Just a reminder that voting for prepared talks for code4lib 2011 is ongoing and open through Dec 1, 2010. Please vote if you have not done so already. To vote - go here - http://vote.code4lib.org/election/index/17 If you have never voted before you will need to register here first - http://code4lib.org/user/register Thanks Robert ** Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries Associate Director, Data to Insight Center-Pervasive Technology Institute Executive Director, Kuali OLE Frye Leadership Institute Fellow 2009 Indiana University Herman B Wells Library 234 1320 East 10th Street Bloomington, IN 47405 Phone: 812-856-4834 Email: rob...@indiana.edu Skype/GTalk: rhmcdonald AIM/MSN: rhmcdonald1
Re: [CODE4LIB] detecting user copying URL?
On Thu, Dec 2, 2010 at 12:25 AM, Susan Kane adarconsult...@gmail.comwrote: Absolutely this should be solved by the vendors / content providers but -- just for the sake of argument -- it is a possible extension for LibX? You can't send a standard message everytime a user copies a URL from their address bar -- they would kill you. Is there a way for a browser plugin to know that the user is on a specific website and to warn them for such actions while there? Or would that level of coordination between the website and the address bar be (a) impossible or (b) not really not worth the effort or (c) a serious privacy concern? Extensions such as LibX can certainly interpose when users bookmark items, at least in Firefox (and possibly Chrome). The question is how to determine if a URL is bookmarkable or not. This could be done either by consulting a database - online or built-in, or perhaps by using heuristics (for instance, URLs containing session ids are often not bookmarkable.) - Godmar
[CODE4LIB] Q: Summon API Service?
Hi, Unlike Link/360, Serials Solution's Summon API is extremely cumbersome to use - requiring, for instance, that requests be digitally signed. (*) Has anybody developed a proxy server for Summon that makes its API public (e.g. receives requests, signs them, forwards them to Summon, and relays the result back to a HTTP client?) Serials Solutions publishes some PHP5 and Ruby sample code in two API libraries (**), but these don't appear to be fully fledged nor easy-to-install solutions. (Easy to install here is defined as an average systems librarian can download them, provide the API key, and have a running solution in less time than it takes to install Wordpress.) Thanks! - Godmar (*) http://api.summon.serialssolutions.com/help/api/authentication (**) http://api.summon.serialssolutions.com/help/api/code
Re: [CODE4LIB] Safari extensions
On Fri, Aug 6, 2010 at 8:19 AM, Joel Marchesoni jma...@email.wcu.edu wrote: Honestly I try to switch to Chrome every month or so, but it just doesn't do what Firefox does for me. I've actually been using a Firefox mod called Pale Moon [1] that takes out some of the not so useful features for work (parental controls, etc) and optimizes for current processors. It's not a huge speed increase, but it is definitely noticeable. Chrome is certainly behind Firefox in its extension capability. For instance, it doesn't allow the extension of context menus yet (planned for later this year or next), and even the planned API will be less flexible than Firefox's . It is hobbled by the fact that the browser is not itself written using the same markup language as its extensions, so Google's programmers have to add an API (along with a C++ implementation) for every feature they want supported. Regarding the JavaScript performance, both Firefox and Chrome have just-in-time compilers in their engines (Chrome uses V8, Firefox uses TraceMonkey), which each provide an order or two of magnitudes speedup compared to interpreters that were used in FF 3.0 and before. Regarding resource usage, it's difficult to tell. Firefox is certainly a memory hog, with internal memory leaks, but when the page itself is the issue (perhaps because the JavaScript programmer leaked memory), then both browsers are affected. In Chrome, I've observed two problems. First, if a page leaks, then the corresponding tab will simply ask for more memory from the OS. There are no resource controls at this point. The effect is the same as in Firefox. Second, each page is scheduled separately by the OS. I've observed that Chrome tabs slow to a halt in Windows XP because the OS is starving a tab's thread if there are CPU-bound activities on the machine, making Chrome actually very difficult to use. - Godmar
Re: [CODE4LIB] Safari extensions
No, nothing beyond a quick read-through. The architecture is similar to Google Chrome's - which is perhaps not surprising given that both Safari and Chrome are based on WebKit - which for us at LibX means we should be able to leverage the redesign we did for LibX 2.0. A notable characteristic of this architecture is that content scripts that interact with a page are in a separate OS process from the main extensions' code, thus they have to communicate with the main extension via message passing rather than by exploiting direct method calls as in Firefox. - Godmar On Thu, Aug 5, 2010 at 4:04 PM, Eric Hellman e...@hellman.net wrote: Has anyone played with the new Safari extensions capability? I'm looking at you, Godmar. Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/ @gluejar
Re: [CODE4LIB] Safari extensions
On Thu, Aug 5, 2010 at 4:15 PM, Raymond Yee y...@berkeley.edu wrote: Has anyone given thought to how hard it would be to port Firefox extensions such as LibX and Zotero to Chrome or Safari? (Am I the only one finding Firefox to be very slow compared to Chrome?) We have ported LibX to Chrome, see http://libx.org/releases/gc/ Put briefly, Chrome provides an extension API that is entirely JavaScript/HTML based. As such, existing libraries such as jQuery can be used to implement the extensions' user interface (such as LibX's search box, implemented as a browser action). Unlike Firefox, no coding in a special-purpose user interface markup language such as XUL is required. (That said, it's possible to achieve the same in Firefox, and in fact we're now using the same HTML/JS code in Firefox, reducing the XUL-specific to a minimum). Safari will use the same approach. Chrome also supports content scripts that interact with the page a user is looking at. These scripts live in an environment that is similar to the environment seen by client-side code coming from the origin. In this sense, it's very similar to how Firefox works with its sandboxes, with the exception mentioned in my previous email that all communication outside has to be done via message passing (sending JSON-encoded objects back and forth). - Godmar
Re: [CODE4LIB] SerSol 360Link API?
I wrote to-JSON proxy a while ago: http://libx.lib.vt.edu/services/link360/index.html I found the Link360 doesn't handle load very well. Even a small burst of requests leads to a spike in latency and error responses. I ask SS if this was a bug or part of some intentional throttling attempt, but never received a reply. Didn't pursue it further. http://libx.lib.vt.edu/services/link360/index.html - Godmar On Mon, Apr 19, 2010 at 2:42 AM, David Pattern d.c.patt...@hud.ac.ukwrote: Hiya We're using it to add e-holdings into to our OPAC, e.g. http://library.hud.ac.uk/catlink/bib/396817/ I've also tried using the API to add the coverage info to the availability text for journals in Summon (e.g. Availability: print (1998-2005) electronic (2000-present)). I've made quite a few tweaks to our 360 Link (mostly using jQuery), so I'm half tempted to have a go using the API to develop a complete replacement for 360 Link. If anyone's already done that, I'd be keen to hear more. regards Dave Pattern University of Huddersfield From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan Rochkind [rochk...@jhu.edu] Sent: 19 April 2010 03:50 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] SerSol 360Link API? Is anyone using the SerSol 360Link API in a real-world production or near-production application? If so, I'm curious what you are using it for, what your experiences have been, and in particular if you have information on typical response times of their web API. You could reply on list or off list just to me. If I get interesting information especially from several sources, I'll try to summarize on list and/or blog either way. Jonathan --- This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com - Godmar
[CODE4LIB] Q: XML2JSON converter
Hi, Can anybody recommend an open source XML2JSON converter in PhP or Python (or potentially other languages, including XSLT stylesheets)? Ideally, it should implement one of the common JSON conventions, such as Google's JSON convention for GData [1], but anything that preserves all elements, attributes, and text content of the XML file would be acceptable. Note that json_encode(simplexml_load_file(...)) does not meet this requirement - in fact, nothing based on simplexml_load_file() will. (It can't even load MarcXML correctly). Thanks! - Godmar [1] http://code.google.com/apis/gdata/docs/json.html
Re: [CODE4LIB] Q: what is the best open source native XML database
On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan shan...@jhu.edu wrote: I've had the best experience (query speed, primarily) with BaseX. This was primarily for large XML document processing, so I'm not sure how much it will satisfy your transactional needs. I was initially using eXist, and then switched over to BaseX because the speed gains were very noticeable. What about the relative maturity/functionality of eXist vs BaseX? I'm a bit skeptical to put my eggs in a University project basket not backed by a continuous revenue stream (... did I just say that out loud?) - Godmar
[CODE4LIB] Q: what is the best open source native XML database
Hi, we're currently looking for an XML database to store a variety of small-to-medium sized XML documents. The XML documents are unstructured in the sense that they do not follow a schema or DTD, and that their structure will be changing over time. We'll need to do efficient searching based on elements, attributes, and full text within text content. More importantly, the documents are mutable. We'll like to bring documents or fragments into memory in a DOM representation, manipulate them, then put them back into the database. Ideally, this should be done in a transaction-like manner. We need to efficiently serve document fragments over HTTP, ideally in a manner that allows for scaling through replication. We would prefer strong support for Java integration, but it's not a must. Have other encountered similar problems, and what have you been using? So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ), Base-X (http://www.basex.org/ ), MonetDB/XQuery (http://www.monetdb.nl/XQuery/ ), Sedna (http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few others here: http://en.wikipedia.org/wiki/XML_database I'm wondering to what extent systems such as Lucene, or even digital object repositories such as Fedora could be coaxed into this usage scenario. Thanks for any insight you have or experience you can share. - Godmar
Re: [CODE4LIB] ipsCA Certs
Hi, in my role as unpaid tech advisor for our local library, may I ask a question about the ipsCA issue? Is my understanding correct that ipsCA currently reissues certificates [1] signed with a root CA that is not yet in Mozilla products, due to IPS's delaying the necessary vetting process [2]? In other words, Mozilla users would see security warnings even if a reissued certificate was used? The reason I'm confused is that I, like David, saw a number of still valid certificates from IPS Internet publishing Services s.l. already shipping with Firefox, alongside the now-expired certificate. But I suppose those certificates are for something else and the reissued certificates won't be signed using them? Thanks, - Godmar [2] https://bugzilla.mozilla.org/show_bug.cgi?id=529286 [1] http://certs.ipsca.com/Support/hierarchy-ipsca.asp On Thu, Dec 17, 2009 at 4:02 PM, John Wynstra john.wyns...@uni.edu wrote: Out of curiosity, did anyone else using ipsCA certs receive notification that due to the coming expiration of their root CA (December 29,2009), they would need a reissued cert under a new root CA? I am uncertain as to how this new Root CA will become a part of the browsers trusted roots without some type of user action including a software upgrade, but the following library website instructions lead me to believe that this is not going to be smooth. http://bit.ly/53Npel We are just about to go live with EZProxy in January with an ipsCA cert issued a few months ago, and I am not about to do that if I have serious browser support issue. -- John Wynstra Library Information Systems Specialist Rod Library University of Northern Iowa Cedar Falls, IA 50613 wyns...@uni.edu (319)273-6399
Re: [CODE4LIB] Character problems with tictoc
The string in question is double-encoded, that is, a string that's in UTF-8 already was run through a UTF-8 encoder. The string is Acta Ortopedica where the 'e' is really '\u00e9' aka 'Latin Small Letter E with Acute'. [1] In UTF-8, the e-acute is two-byte encoded as C3 A9. If you run the bytes C3 A9 through a UTF-8 encoder, C3 ('\u00c3' - Capital A with tilde) becomes C3 83 and A9 (copyright sign, '\u00a9' becomes C2 A9). C3 83 C2 A9 is exactly what JISC is serving, what it should be serving is C3 A9. Send email to them. - Godmar [1] http://www.utf8-chartable.de/ 2009/12/21 Glen Newton glen.new...@nrc-cnrc.gc.ca [I realise there was a recent related 'Character-sets for dummies'[1] discussion recently] I am using tictocs[2] list of journal RSS feeds, and I am getting gibberish in places for diacritics. Below is an example: in emacs: 221 Acta Ortop dica Brasileira http://www.scielo.br/rss.php?pid=1413-7852lang=en 1413-7852 in Firefox: 221 Acta Ortop dica Brasileira http://www.scielo.br/rss.php?pid=1413-7852lang=en 1413-7852 Note that the emacs view is both of a save of the Firefox, and from a direct download using 'wget'. Is this something on my end, or are the tictocs people not serving proper UTF-8? The HTTP header from wget claims UTF-8: wget -S http://www.tictocs.ac.uk/text.php --2009-12-21 12:47:59-- http://www.tictocs.ac.uk/text.php Resolving www.tictocs.ac.uk... 130.88.101.131 Connecting to www.tictocs.ac.uk|130.88.101.131|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Mon, 21 Dec 2009 17:42:05 GMT Server: Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8k PHP/5.3.0 DAV/2 X-Powered-By: PHP/5.3.0 Content-Type: text/plain; charset=utf-8 Connection: close Length: unspecified [text/plain] stuff removed Can someone validate if they are also experiencing this issue? Thanks, Glen [1]https://listserv.nd.edu/cgi-bin/wa?S2=CODE4LIBq=s=character-sets+for+dummiesf=a=b= [2]http://www.tictocs.ac.uk/text.php -- Glen Newton | glen.new...@nrc-cnrc.gc.ca Researcher, Information Science, CISTI Research NRC W3C Advisory Committee Representative http://tinyurl.com/yvchmu tel/t l: 613-990-9163 | facsimile/t l copieur 613-952-8246 Canada Institute for Scientific and Technical Information (CISTI) National Research Council Canada (NRC)| M-55, 1200 Montreal Road http://www.nrc-cnrc.gc.ca/ Institut canadien de l'information scientifique et technique (ICIST) Conseil national de recherches Canada | M-55, 1200 chemin Montr al Ottawa, Ontario K1A 0R6 Government of Canada | Gouvernement du Canada --
Re: [CODE4LIB] Character problems with tictoc
I believe they've changed it while we were having the discussion. When I downloaded the file (with curl), it looked like this: 0020700 r t o p C etx B ) d i c a sp B r a 72 74 6f 70 c3 83 c2 a9 64 69 63 61 20 42 72 61 0020720 s i l e i r a ht h t t p : / / w 73 69 6c 65 69 72 61 09 68 74 74 70 3a 2f 2f 77 - Godmar On Mon, Dec 21, 2009 at 2:24 PM, Erik Hetzner erik.hetz...@ucop.edu wrote: At Mon, 21 Dec 2009 14:09:28 -0500, Glen Newton wrote: It seems that different people are seeing different things in their respective viewers (i.e some are OK and others are like what I am seeing). When I use wget and view the local file in Firefox (3.0.4, Linux Suse 11.0) I see: http://cuvier.cisti.nrc.ca/~gnewton/tictoc1.gif [gif used as it is not lossy] The text is clearly not correct. The file I got with wget is: http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt Is this just a question of different client software (and/or OSes) viewing or mangling the content? When dealing with character set issues (especially the dreaded double-encoding!) I find it best to use hex editors or dumpers. If in emacs, try M-x hexl-find-file. On a Unix command line, the od or hd commands are useful. For the record: 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.| 0010 0a 44 61 74 65 3a 20 4d 6f 6e 2c 20 32 31 20 44 |.Date: Mon, 21 D| 0020 65 63 20 32 30 30 39 20 31 39 3a 32 32 3a 33 38 |ec 2009 19:22:38| 0030 20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 | GMT..Server: Ap| 0040 61 63 68 65 2f 32 2e 32 2e 31 33 20 28 55 6e 69 |ache/2.2.13 (Uni| 0050 78 29 20 6d 6f 64 5f 73 73 6c 2f 32 2e 32 2e 31 |x) mod_ssl/2.2.1| 0060 33 20 4f 70 65 6e 53 53 4c 2f 30 2e 39 2e 38 6b |3 OpenSSL/0.9.8k| 0070 20 50 48 50 2f 35 2e 33 2e 30 20 44 41 56 2f 32 | PHP/5.3.0 DAV/2| 0080 0d 0a 58 2d 50 6f 77 65 72 65 64 2d 42 79 3a 20 |..X-Powered-By: | 0090 50 48 50 2f 35 2e 33 2e 30 0d 0a 43 6f 6e 74 65 |PHP/5.3.0..Conte| 00a0 6e 74 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 6c |nt-Type: text/pl| 00b0 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 75 74 66 |ain; charset=utf| 00c0 2d 38 0d 0a 54 72 61 6e 73 66 65 72 2d 45 6e 63 |-8..Transfer-Enc| 00d0 6f 64 69 6e 67 3a 20 63 68 75 6e 6b 65 64 0d 0a |oding: chunked..| ... 2230 4f 72 74 68 6f 70 61 65 64 69 63 61 09 68 74 74 |Orthopaedica.htt| 2240 70 3a 2f 2f 69 6e 66 6f 72 6d 61 68 65 61 6c 74 |p://informahealt| 2250 68 63 61 72 65 2e 63 6f 6d 2f 61 63 74 69 6f 6e |hcare.com/action| 2260 2f 73 68 6f 77 46 65 65 64 3f 6a 63 3d 6f 72 74 |/showFeed?jc=ort| 2270 26 74 79 70 65 3d 65 74 6f 63 26 66 65 65 64 3d |type=etocfeed=| 2280 72 73 73 09 31 37 34 35 2d 33 36 37 34 09 31 37 |rss.1745-3674.17| 2290 34 35 2d 33 36 38 32 0a 32 32 31 09 41 63 74 61 |45-3682.221.Acta| 22a0 20 4f 72 74 6f 70 c3 a9 64 69 63 61 20 42 72 61 | Ortop..dica Bra| 22b0 73 69 6c 65 69 72 61 09 68 74 74 70 3a 2f 2f 77 |sileira.http://w| ... best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Character problems with tictoc
On Mon, Dec 21, 2009 at 2:09 PM, Glen Newton glen.new...@nrc-cnrc.gc.ca wrote: The file I got with wget is: http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt (Just to convince myself I'm not going nuts...) - this file, which Glen downloaded with wget, appears double-encoded: # curl -s http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt | od -a -t x1 | head -1082 | tail -4 0020660 - 3 6 8 2 nl 2 2 1 ht A c t a sp O 2d 33 36 38 32 0a 32 32 31 09 41 63 74 61 20 4f 0020700 r t o p C etx B ) d i c a sp B r a 72 74 6f 70 c3 83 c2 a9 64 69 63 61 20 42 72 61 - Godmar
Re: [CODE4LIB] SerialsSolutions Javascript Question
On Wed, Oct 28, 2009 at 9:49 PM, Michael Beccaria mbecca...@paulsmiths.eduwrote: I should clarify. The most granular piece of information in the html is a class attribute (i.e. there is no id). Here is a snippet: div class=SS_Holding style=background-color: #CECECE !-- Journal Information -- span class=SS_JournalTitlestrongAnnals of forest science./strong/spannbsp;span class=SS_JournalISSN(1286-4560)/span I want to alter the span class=SS_JournalISSN(1286-4560)/span section. Maybe add some html after the issn that tells whether it is peer reviewed or not. Yes - you'd write code similar to this one: $(document).ready(function () { $(SS_JournalISSN).each(function () { var issn = $(this).text().replace(/[^\dxX]/g, ); var self = this; $.getJSON(http: xissn.oclc.issn= + issn + format=jsoncallback=., function (data) { $(self).append( data ... [ 'is peer reviewed' ] ); }); }); }); - Godmar
Re: [CODE4LIB] Setting users google scholar settings
It used to be you could just GET the corresponding form, e.g.: http://scholar.google.com/scholar_setprefs?num=10instq=inst=sfx-f7e167eec5dde9063b5a8770ec3aaba7q=einsteininststart=0submit=Save+Preferences - Godmar On Wed, Jul 15, 2009 at 3:17 AM, Stuart Yeatesstuart.yea...@vuw.ac.nz wrote: It's possible to send users to google scholar using URLs such as: http://scholar.google.co.nz/schhp?hl=eninst=8862113006238551395 where the institution is obtained using the standard preference setting mechanism. Has anyone found a way of persisting this setting in the users browser, so when they start a new session this is the default? Yes, I know they can go Scholar Preferences - Save to persist it, but I'm looking for a more automated way of doing it... cheers stuart
Re: [CODE4LIB] tricky mod_rewrite
On Wed, Jul 1, 2009 at 4:58 AM, Peter Kiraly pkir...@tesuji.eu wrote: Hi Eric, try this: IfModule mod_rewrite.c RewriteEngine on RewriteBase /script RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} !=/favicon.ico RewriteRule ^(.*)$ script.cgi?param1=$1 [L,QSA] /IfModule Here's a challenge question: is it possible to write this without hardwiring the RewriteBase in it? So that it can be used, for instance, in an .htaccess file from within any /path? - Godmar
Re: [CODE4LIB] tricky mod_rewrite
On Wed, Jul 1, 2009 at 9:13 AM, Peter Kiraly pkir...@tesuji.eu wrote: From: Godmar Back god...@gmail.com is it possible to write this without hardwiring the RewriteBase in it? So that it can be used, for instance, in an .htaccess file from within any /path? Yes, you can put it into a .htaccess file, and the URL rewrite will apply on that directory only. You misunderstood the question; let me rephrase it: Can I write a .htaccess file without specifying the path where the script will be located in RewriteBase? For instance, consider http://code.google.com/p/tictoclookup/source/browse/trunk/standalone/.htaccess Here, anybody who wishes to use this code has to adapt the .htaccess file to their path and change the RewriteBase entry. Is it possible to write a .htaccess file that works *no matter* where it is located, entirely based on where it is located relative to the Apache root or an Apache directory? - Godmar
Re: [CODE4LIB] tricky mod_rewrite
On Wed, Jul 1, 2009 at 10:18 AM, Walker, David dwal...@calstate.edu wrote: Is it possible to write a .htaccess file that works *no matter* where it is located I don't believe so. If the .htaccess file lives in a directory inside of the Apache root directory, then you _don't_ need to specify a RewriteBase. It's really only necessary when .htacess lives in a virtual directory outside of the Apache root. I see. Unfortunately, that's the common deployment case by non-administrators (many librarians). They can create .htaccess files, but don't always have control of the main Apache httpd.conf or the root directory. - Godmar
Re: [CODE4LIB] tricky mod_rewrite
On Wed, Jul 1, 2009 at 10:38 AM, Walker, David dwal...@calstate.edu wrote: They can create .htaccess files, but don't always have control of the main Apache httpd.conf or the root directory. Just to be clear, I didn't mean just the root directory itself. If .htacess lives within a sub-directory of the Apache root, then you _don't_ need RewriteBase. RewriteBase is only necessary when you're in a virtual directory, which is physically located outside of Apache's DocumentRoot path. Correct me if I'm wrong. You are correct! If I omit the RewriteBase, it still works in this case. Let's have some more of that sendmail koolaid and up the challenge. How can I write an .htaccess that's path-independent if I like to exclude certain files in that directory, such as index.html? So far, I've been doing: RewriteCond %{REQUEST_URI} !^/services/tictoclookup/standalone/index.html To avoid running my script for index.html. How would I do that? (Hint: the use of SERVER variables on the right-hand side in the CondPattern of a RewriteCond is not allowed, but some trickery may be possible, according to http://www.issociate.de/board/post/495372/Server-Variables_in_CondPattern_of_RewriteCond_directive.html) - Godmar
Re: [CODE4LIB] How to access environment variables in XSL
Let me repeat a small comment I already sent to Mike in private email: in a J2EE environment, information that characterizes a request (such as path, remote addr, etc.) is not accessible in environment variables or properties, unlike in a CGI environment. That means that even if you write an extension for XALAN-J to trigger the execution of your Java code while processing a stylesheet during a request, you don't normally obtain access to this information. Rather it is passed by the servlet container to the servlet via a request object. If you don't control the servlet code - say because it's vendor-provided - then you have to either rely on any extension functionality the vendor may provide, or you have to create your own servlet that wraps the vendor's servlet, saving the request information somewhere where your xalan extension can retrieve it, then forwards the request to the vendor's servlet. - Godmar On Tue, Jun 23, 2009 at 2:04 PM, Cloutman, David dclout...@co.marin.ca.uswrote: I'm in a similar situation in that I've spent the last 6 months cramming XSLT in order to do output from an application provided by a vendor. In my situation, I'm taking information stored in a CMS database as XML fragments and transforming it into our Web site's pages. (The CMS is called Cascade, and is okay, but not fantastic.) The tricky part of this situation is that simply grabbing a book on XPath and XSLT will not tell you everything you need to know in order to work with your proprietary software. Neither will simply knowing what language the middleware layer is written in. Specifically, you need to find out from your vendor what XSLT processor their application. In my case, I found out that my CMS uses Xalan, which impacts my situation significantly, since it limits me to XSLT 1.0. However, the Xalan processor does allow for one to script extensions, and in my case I _might_ be able to leverage that fact to access some system information, depending on what capabilities my vendor has given me. So, in short, making the most of the development environment you have in creating your XSLT will require you not only to grok the complexities of what I think is a rather difficult language to master, but also to gain a good understanding of what tools are and are not available to you through your preprocessor. Just to address your original question, XSLT really is not designed to work like a conventional programming language per-se. You may or may not have direct access to environment variables. That is dependent upon how the XSLT processor is implemented by your vendor. I did see some creative ideas in other posts, and I do not know if they will or will not work. However, it is often possible for the middleware layer to pass data to the XSLT processor, thus exposing it to the XSLT developer. However, what data gets passed to the XSLT developer is generally under the control of the application developer. Here is a quick example of how XML data and XSLT presentation logic can be glued together in PHP using a non-native XSLT processor. This is being done similarly by our respective Java applications, using different XSLT processors, and hopefully a lot more error checking. http://frenzy.marinlibrary.org/code-samples/php-xslt/middleware.php In the example, I have passed some environment data to the XSLT processor from the PHP middleware layer. As you will see, what data is exposed is entirely determined by the PHP. Good luck! - David --- David Cloutman dclout...@co.marin.ca.us Electronic Services Librarian Marin County Free Library -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Doran, Michael D Sent: Friday, June 19, 2009 2:53 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to access environment variables in XSL Hi Dave, What XSLT processor and programming language are you using? I'm embarrassed to say that I'm not sure. I'm making modifications and enhancements to already existing XSL pages that are part of the framework of Ex Libris' new Voyager 7.0 OPAC. This new version of the OPAC is running under Apache Tomcat (on Solaris) and my assumption is that the programming language is Java; however the source code for the app itself is not available to me (and I'm not a Java programmer anyway, so it's a moot point). I assume also that the XSLT processor is what comes with Solaris (or Tomcat?). As you can probably tell, this stuff is new to me. I've been trying to take a Sun Ed XML/XSL class for the last year, but it keeps getting cancelled for lack of students. Apparently I'm the last person left in the Dallas/Fort Worth area that needs to learn this stuff. ;-) -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for
Re: [CODE4LIB] How to access environment variables in XSL
Running in a J2EE is somewhat different from running in a CGI environment. Specifically, variables such as REMOTE_ADDR, etc. are not stored in environment variables that are easily accessible. Assuming that your XSLT is executed for each request (which, btw, is not a given since Voyager may well be caching the results of the style-sheet application), your vendor may set up the XSLT processor environment to provide access to variables related to the current request, for instance, via XALAN-J extensions. If they did that, it would probably be in the documentation to which you have access under NDA. If not, things will be a lot more complicated. You'll probably have to wrap the servlet in your own; store the current servlet request in a thread-local variable, then create an xalan extension to access it during the XSLT processing. That requires a fair bit of Java/J2EE trickery, but is definitely possible (and will likely void your warranty.) - Godmar On Fri, Jun 19, 2009 at 9:42 PM, Tom Pasley tom.pas...@gmail.com wrote: Hi, I see Michael's here too - (he's a bit of a guru on the Voyager-L listserv :-D). Michael, if you have a look at the Vendor URL, there's some info there, but you might also try having a look through some of these G.search results: site:xml.apache.org inurl:xalan-j system - see if that helps any - like to help more, but I've got to go! Tom On Sat, Jun 20, 2009 at 10:11 AM, Doran, Michael D do...@uta.edu wrote: Hi Jon, Try putting somewhere in one of the xslt pages Cool! Here's the output: Version: 1 Vendor: Apache Software Foundation Vendor URL: http://xml.apache.org/xalan-j -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jon Gorman Sent: Friday, June 19, 2009 5:05 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to access environment variables in XSL Try putting somewhere in one of the xslt pages p Version: xsl:value-of select=system-property('xsl:version') / br / Vendor: xsl:value-of select=system-property('xsl:vendor') / br / Vendor URL: xsl:value-of select=system-property('xsl:vendor-url') / /p Jon On Fri, Jun 19, 2009 at 4:53 PM, Doran, Michael Ddo...@uta.edu wrote: Hi Dave, What XSLT processor and programming language are you using? I'm embarrassed to say that I'm not sure. I'm making modifications and enhancements to already existing XSL pages that are part of the framework of Ex Libris' new Voyager 7.0 OPAC. This new version of the OPAC is running under Apache Tomcat (on Solaris) and my assumption is that the programming language is Java; however the source code for the app itself is not available to me (and I'm not a Java programmer anyway, so it's a moot point). I assume also that the XSLT processor is what comes with Solaris (or Tomcat?). As you can probably tell, this stuff is new to me. I've been trying to take a Sun Ed XML/XSL class for the last year, but it keeps getting cancelled for lack of students. Apparently I'm the last person left in the Dallas/Fort Worth area that needs to learn this stuff. ;-) -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Walker, David Sent: Friday, June 19, 2009 2:48 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] How to access environment variables in XSL Micahael, What XSLT processor and programming language are you using? --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Doran, Michael D [do...@uta.edu] Sent: Friday, June 19, 2009 12:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] How to access environment variables in XSL I am working with some XSL pages that serve up HTML on the web. I'm new to XSL. In my prior web development, I was accustomed to being able to access environment variables (and their values, natch) in my CGI scripts and/or via Server Side Includes. Is there an equivalent mechanism for accessing those environment variables within an XSL page? These are examples of the variables I'm referring to: SERVER_NAME SERVER_PORT HTTP_HOST DOCUMENT_URI REMOTE_ADDR
Re: [CODE4LIB] FW: [CODE4LIB] Newbie asking for some suggestions with javascript
On Mon, Jun 15, 2009 at 4:09 PM, Roy Tennant tenna...@oclc.org wrote: It is worth following up on Xiaoming's statement of a limit of 100 uses per day of the xISSN service with the information that exceptions to this limite are certainly granted. Annette probably knows that just such an exception was granted to her LibX project, and LibX remains the single largest user of this service. Roy Yes, Roy is correct. We are very grateful for OCLC's generous support and would like to acknowledge that publicly. FWIW, I suggested the inclusion of ticTOCs RSS feed data in the survey OCLC sent out two weeks ago, and less than a week later, OCLC rolls out the improved service. Excellent! [ As an aside, in LibX, we are changing the way we use the service; previously, we were looking up all ISSNs on any page a user visits; we are now retrieving the metadata if the user actually hovers over the link. Not that OCLC complained - but CrossRef did when they noticed 100,000 hits per day against their service for DOI metadata lookups. In fairness to CrossRef, they are working on beefing up their servers as well. ] - Godmar Annette for Team LibX.
Re: [CODE4LIB] Newbie asking for some suggestions with javascript
Yes - see this email http://serials.infomotions.com/code4lib/archive/2009/200905/0909.html If you can host yourself, the stand-alone version is efficient and easy to keep up to date - just run a cronjob that downloads the text file from JISC. My WSGI script will automatically pick up if it has changed on disk. - Godmar On Thu, Jun 11, 2009 at 4:08 PM, Annette Bailey afbai...@vt.edu wrote: Godmar Back wrote a web service in python for ticTOC with an eye to incorporating links into III's Millennium catalog. http://code.google.com/p/tictoclookup/ http://tictoclookup.appspot.com/ Annette On Thu, Jun 11, 2009 at 12:34 PM, Derik Badmandbad...@temple.edu wrote: Hello all, Just joined the list, and I'm hoping to get a suggestion or two. I'm working on using the ticTOCs ( http://www.tictocs.ac.uk/ ) text file of rss feed urls for journals to insert links to those feeds in our Serials Solution Journal Finder. I've got it working using a bit of jQuery. Demo here: http://155.247.22.22/badman/toc/demo.html The javascript is here: http://155.247.22.22/badman/toc/toc-rss.js Getting that working wasn't too hard, but I'm a bit concerned about efficiency and caching. I'm not sure the way I'm checking isbns against the text file is the most efficient way to go. Basically I'm making an ajax call to the file that takes the data and makes an array of objects. I then query the isbn of each journal on the page against the array of objects. If there's a match I pull the data and put it on the page. I'm wondering if there's a better way to do this, especially since the text file is over 1mb. I'm not looking for code, just ideas. I'm also looking for any pointers about using the file itself and somehow auto-downloading it to my server on a regular basis. Right now I just saved a copy to my server, but in the future it'd be good to automate grabbing the file from ticTOCs server on a regular basis and updating the one on my server (perhaps I'd need to use a cron job to do that?). Thanks for much for any suggestions or pointers. (For what it's worth, I can manage with javascript or php.) -- Derik A. Badman Digital Services Librarian Reference Librarian for Education and Social Work Temple University Libraries Paley Library 209 Philadelphia, PA Phone: 215-204-5250 Email: dbad...@temple.edu AIM: derikbad Research makes times march forward, it makes time march backward, and it also makes time stand still. -Greil Marcus
Re: [CODE4LIB] A Book Grab by Google
On Wed, May 20, 2009 at 8:42 PM, Karen Coyle li...@kcoyle.net wrote: No, it's not uniquely Google, but adding another price pressure point to libraries is still seen as detrimental. I'm sure you saw: http://www.nytimes.com/2009/05/21/technology/companies/21google.html The new agreement, which Google hopes other libraries will endorse, lets the University of Michigan object if it thinks the prices Google charges libraries for access to its digital collection are too high, a major concern of some librarians. Any pricing dispute would be resolved through arbitration. - Godmar
Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes
On Tue, May 19, 2009 at 8:26 AM, Boheemen, Peter van peter.vanbohee...@wur.nl wrote: Clever idea to put the TicToc stuff 'in the cloud'. How are you going to keep it up-to-date ? By periodically reuploading the entire set (which takes about 15-20 mins), new or changed records can be updated. A changed record is one with a new RSS feed for the same ISSN + Title combination; the data is keyed by ISSN+Title. This process can be optimized by only uploading the delta (you upload .csv files, so the delta can be obtained easily via comm(1)). Removing records is a bit of a hassle since GAE does not provide an easy-to-use interface for that. It's possible to wipe an entire table clean by repeatedly deleting 500 records at a time (the entire set is about 19,000 records), then doing a fresh import. This can be done by uploading a console application into the cloud. (http://con.appspot.com/console/help/about ) Alternatively, smaller sets of records can be deleted via a remove handler, which I haven't implemented yet. A script will need to post the data to be removed against the handler. Will do that though if anybody uses it. User impact is low if old records aren't removed. A possible alternative is to have the GAE app periodically verify the validity of each requested record with a server we'd have to run. (Pulling the data straight from tictocs.ac.uk doesn't work since it's larger what you're allowed to fetch.) This approach would somewhat defeat the idea of the cloud since we'd have to rely on keeping that server operational, albeit at a lower degree of availability and load. Another potential issue is the quota Google provides: you get 10GBytes and 1.3M requests free per 24 hour period, then they start charging you ($.12 per GByte) I think I mentioned in my post that I included a non-GAE version of the server that only requires mod_wsgi. For that standalone version, keeping the data set up to date is implemented by checking the last mod time of its localy copy - it will reread its data when it detects a more recent jrss.txt in its current directory, so keeping its data up to date is a simple a periodically curling http://www.tictocs.ac.uk/text.php - Godmar
[CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes
Hi, I would like to share a few pointers to web services and widgets Annette and I recently collaborated on. All are available under an open source license. Widgets are CSS-styled HTML elements (span or div) that provide dynamic behavior related to the underlying web service. These are suitable for non-JavaScript programmers familiar with HTML/CSS. 1. MAJAX 2: Includes a JSON web service (e.g., http://libx.lib.vt.edu/services/majax2/isbn/1412936373 or http://libx.lib.vt.edu/services/majax2/isbn/006073132x?opacbase=http%3A%2F%2Flibcat.lafayette.edu%2Fsearchjsoncallback=majax.processResults ) and a set of widgets to include results into web pages, see http://libx.lib.vt.edu/services/majax2/ Supports the same set of features as MAJAX 1 (libx.org/majax) Source is at http://code.google.com/p/majax2/ 2. ticTOC lookup: is a Google App Engine app that provides a REST interface to JISC's ticTOC data set that maps ISSN to URLs of table of contents RSS feeds. See http://tictoclookup.appspot.com/ Example: http://tictoclookup.appspot.com/0028-0836 and optional refinement by title: http://tictoclookup.appspot.com/0028-0836?title=Nature A widget library is available; see http://laurel.lib.vt.edu/record=b1251610~S7 for a demo (shows floating tooltips with table of contents preview via Google Feeds and places a link to RSS feeds) The source is at http://code.google.com/p/tictoclookup/ and includes a stand-alone version of the web service which doesn't use GAE. The widget library includes support for integration into III's record display. 3. Google Book Classes at http://libx.lib.vt.edu/services/googlebooks/ - these are widgets for Google's Book Search Dynamic Links API. Noteworthy is support for integration into III's OPAC on the search results page (briefcit.html), on the so-called bib display page (bib_display.html) and their WebBridge product via field selectors, all without JavaScript. Source is at http://code.google.com/p/googlebooks/ 4. A Link/360 JSON Proxy. See http://libx.lib.vt.edu/services/link360/index.html This one takes Serials Solution's Link/360 XML Service and proxies it as JSON. Currently does not include a widget set. Caches results 24 hours to match db update frequency. Source is at http://code.google.com/p/link360/ Could be combined with a widget library, or programmed to directly, to weave Link/360 holdings data into pages. All JSON services accept 'jsoncallback=' for cross-domain client-side integration. The libx.lib.vt.edu URLs are ok to use for testing, but for production use we recommend your own server. All modules are written in Python as WSGI scripts, requiring setup as simple as mod_wsgi + .htaccess. - Godmar
Re: [CODE4LIB] Q: AtomPub (APP) server libraries for Python?
2) an XML library that doesn't choke on foreign characters. (I assume you're using ElementTree now?) I meant foreign markup, as in foreign to the atom: name space. Let me give an example. Suppose I want to serve results the way Google does in YouTube; suppose I want to return XML similar to this one: http://gdata.youtube.com/feeds/api/videos?vq=triumph+street+tripleracy=includeorderby=viewCount It contains lots of foreign XML (opensearch, etc.) and it contains lots of boilerplate (title, link, id, updated, category, etc. etc.) that must be gotten right to be Atom-compliant. I don't want to implement any of this. I'd like to write the minimum amount of code that can turn information I have in flat files into Atom documents, without having to worry about the well-formedness or even construction of an Atom feed, or its internal consistency. (Perhaps similar to Pilgrim's feedparser, except that this library a) doesn't handle all of Atom, b) doesn't support foreign XML - in fact, doesn't even use an XML library), and is generally not intended for the creation of feeds. Given the adoption RFC 5023 has seen by major companies, I'm really surprised at the lack of any supporting server libraries; perhaps not surprisingly, the same is not true for client libraries. - Godmar On Wed, Jan 28, 2009 at 9:43 AM, Ross Singer rossfsin...@gmail.com wrote: Godmar, What do you need the library to do? It seems like you'd be able to make an AtomPub server pretty easily with web.py (you could use the Jangle Core as a template, it's in Ruby, but the framework it uses, Sinatra, is very similar to web.py). It seems like there are two things you need here: 1) something that can RESTfully broker a bunch of incoming HTTP requests and return Atom Feeds and Service documents Is that right? -Ross. On Wed, Jan 28, 2009 at 8:13 AM, Godmar Back god...@gmail.com wrote: Hi, does anybody know or can recommend any server side libraries for Python that produce AtomPub (APP)? Here are the options I found, none of which appear suitable for what I'd like to do: amplee: http://mail.python.org/pipermail/python-announce-list/2008-February/006436.html django-atompub: http://code.google.com/p/django-atompub/ flatatompub http://blog.ianbicking.org/2007/09/12/flatatompub/ Either they are immature, or require frameworks, or form frameworks, and most cannot well handle foreign XML. - Godmar
Re: [CODE4LIB] COinS in OL?
On Thu, Dec 4, 2008 at 2:31 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: Not that I know of. You can say display:none, but that'll probably hide it from LibX etc too. No, why would it. BTW, I don't see why screen readers would stumble over this when the child of the span is empty. Do they try to read empty text? And if a COinS is processed, we fix up the title so tooltips show nicely. - Godmar What is needed is a CSS @media for screen readers, like one exists for 'print'. So you could have a seperate stylesheet for screenreaders, like you can have a seperate stylesheet for print. That would be the right way to do it. But doesn't exist. Jonathan Thomas Dowling wrote: On 12/04/2008 02:02 PM, Jonathan Rochkind wrote: Yeah, I had recently noticed indepedently, been unhappy with the way a COinS title shows up in mouse-overs, and is reccommended to be used by screen readers. Oops. By any chance, do current screen readers honor something like 'span class=Z3988 style=speak:none title=...'? -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] COinS in OL?
On Fri, Dec 5, 2008 at 1:14 PM, Ross Singer [EMAIL PROTECTED] wrote: On Fri, Dec 5, 2008 at 10:50 AM, Godmar Back [EMAIL PROTECTED] wrote: BTW, I don't see why screen readers would stumble over this when the child of the span is empty. Do they try to read empty text? And if a COinS is processed, we fix up the title so tooltips show nicely. Thinking about this a bit more -- does this leave the COinS in an unusable state if some other agent executes after LibX is done? I spoke too soon. We don't touch the 'title' attribute. But we put content in the previously empty span/span, so there is a potential problem with a screen reader then. (That content, though, has its own 'title' attribute.) - Godmar
Re: [CODE4LIB] COinS in OL?
On Wed, Dec 3, 2008 at 9:12 PM, Ed Summers [EMAIL PROTECTED] wrote: On Tue, Dec 2, 2008 at 3:11 PM, Godmar Back [EMAIL PROTECTED] wrote: COinS are still needed, in particular in situations in which multiple resources are displayed on a page (like, for instance, in the search results pages of most online systems or on pages such as http://citeulike.org, or in a list of references such as in the references section of many Wikipedia pages.) JSON is perfectly capable of returning a list of things. True, but that's besides the point. The metadata needs to be related to some element on the page, such as the text in a reference. The most natural way to do this (and COinS allows this) is to place the COinS next to (for instance) the reference to which it refers. - Godmar
Re: [CODE4LIB] COinS in OL?
Having a per-page link to get an alternate representation of a resource is certainly helpful for some applications, and please do support it, but don't consider the problem solved. The primary weakness of this approach is that it works only if a page is dedicated to a single resource. COinS are still needed, in particular in situations in which multiple resources are displayed on a page (like, for instance, in the search results pages of most online systems or on pages such as http://citeulike.org, or in a list of references such as in the references section of many Wikipedia pages.) - Godmar On Mon, Dec 1, 2008 at 11:21 PM, Ed Summers [EMAIL PROTECTED] wrote: On Mon, Dec 1, 2008 at 11:05 PM, Karen Coyle [EMAIL PROTECTED] wrote: I asked about COinS because it's something I have vague knowledge of. (And I assume it isn't too difficult to implement.) However, if there are other services that would make a bigger difference, I invite you (all) to speak up. It makes little sense to have this large quantity of bib data if it isn't widely and easily usable. Sorry to be overwhelming. I guess the main thing I wanted to communicate is that you could simply add: link rel=alternate type=application/json href=http://openlibrary.org/api/get?key=/b/{open-library-id}; / to the head element in OpenLibrary HTML pages for books, and that would go a long way to making machine readable data for books discoverable by web clients. //Ed
Re: [CODE4LIB] COinS in OL?
Correct. Right now, COinS handling in LibX 1.0 is primitive and always links to the OpenURL resolver. However, LibX 2.0 will allow customized handling so that, for instance, ISBN COinS can be treated differently than dissertation COinS or article CoinS. The framework for this is already partially in place, so ambitious JavaScript programmers can implement such custom handling for their extension; with LibX 2.0, every LibX maintainer will be able to choose their own preferred way of making use of COinS. When you place COinS, don't assume it'll only be used by tools that simply read the info from it - place it in a place in your DOM where there's some white space, or where placing a small link or icon would not destroy the look and feel of your interface. - Godmar On Mon, Dec 1, 2008 at 11:45 AM, Stephens, Owen [EMAIL PROTECTED] wrote: LibX uses COinS as well I think - so generally be useful in taking people from the global context (Open Library) to the local (via LibX) Owen Owen Stephens Assistant Director: eStrategy and Information Resources Central Library Imperial College London South Kensington Campus London SW7 2AZ t: +44 (0)20 7594 8829 e: [EMAIL PROTECTED] -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Karen Coyle Sent: 01 December 2008 16:08 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] COinS in OL? I have a question to ask for the Open Library folks and I couldn't quite figure out where to ask it. This seems like a good place. Would it be useful to embed COinS in the book pages of the Open Library? Does anyone think they might make use of them? Thanks, kc -- --- Karen Coyle / Digital Library Consultant [EMAIL PROTECTED] http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
[CODE4LIB] GAE sample (was: a brief summary of the Google App Engine)
FWIW, the sample application I built to familiarize myself with GAE is a simple REST cache. It's written in 250 lines overall, including Python + YAML. For instance, a resource such as: http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmedretmode=xmlid=3966282 can be accessed via GAE using: http://libxcache.appspot.com/get?url=http%3a%2f%2fwww.ncbi.nlm.nih.gov%2fentrez%2feutils%2fesummary.fcgi%3fdb%3dpubmed%26retmode%3dxml%26id%3d3966282 Or, you can access: http://demo.jangle.org/openbiblio/resources/5974 as http://libxcache.appspot.com/get?url=http%3a%2f%2fdemo.jangle.org%2fopenbiblio%2fresources%2f5974 (To take some load off that Jangle demo, Ross, in case it's slashdotted.) - Godmar
Re: [CODE4LIB] a brief summary of the Google App Engine
On Tue, Jul 15, 2008 at 2:16 PM, Fernando Gomez [EMAIL PROTECTED] wrote: Any thoughts about a convenient way of storing and (more importantly) indexing retrieving MARC records using GAE's Bigtable? GAE uses Django's object-relational model. You can define a Python class, inherit from db.model, declare properties of your model; then instances can be created, stored, retrieved and updated. GAE performs automatic indexing on some fields, and you can tell it to index on others, or using certain combinations. Aside from the limitations imposed by the index model, the problem then is fundamentally similar to how you index MARC data for use in any discovery system. Presumably, you could learn from the experiences of the many projects that have done that - some in Python, such as http://code.google.com/p/fac-back-opac/ (though they use Django, they don't appear to be using its object-relational db model for MARC records; I say this from a 2 min examination of parts of their code; I may be wrong. PyMarc itself doesn't support it.) - Godmar
[CODE4LIB] a brief summary of the Google App Engine
Hi, since I brought up the issue of the Google App Engine (GAE) (or similar services, such as Amazon's EC2 Elastic Compute Cloud), I thought I give a brief overview of what it can and cannot do, such that we may judge its potential use for library services. GAE is a cloud infrastructure into which developers can upload applications. These applications are replicated among Google's network of data centers and they have access to its computational resources. Each application has access to a certain amount of resources at no fee; Google recently announced the pricing for applications whose resource use exceeds the no fee threshold [1]. The no fee threshold is rather substantial: 500MB of persistent storage, and, according to Google, enough bandwidth and cycles to serve about 5 million page views per month. Google Apps must be written in Python. They run in a sandboxed environment. This environment limits what applications can do and how they communicate with the outside world. Overall, the sandbox is very flexible - in particular, application developers have the option of uploading additional Python libraries of their choice with their application. The restrictions lie primarily in security and resource management. For instance, you cannot use arbitrary socket connections (all outside world communication must be through GAE's fetch service which supports http/https only), you cannot fork processes or threads (which would use up CPU cycles), and you cannot write to the filesystem (instead, you must store all of your persistent data in Google's scalable datastorage, which is also known as BigTable.) All resource usage (CPU, Bandwidth, Persistent Storage - though not memory) is accounted for and you can see your use in the application's dashboard control panel. Resources are replenished on the fly where possible, as in the case of CPU and Bandwidth. Developers are currently restricted to 3 applications per account. Making applications in multiple accounts work in tandem to work around quota limitations is against Google's terms of use. Applications are described by a configuration file that maps URI paths to scripts in a manner similar to how you would use Apache mod_rewrite. URIs can also be mapped to explicitly named static resources such as images. Static resources are uploaded along with your application and, like the application, are replicated in Google's server network. The programming environment is CGI 1.1. Google suggests, but doesn't require, the use of supporting libraries for this model, such as WSGI. This use of high-level libraries allows applications to be written in a very compact, high-level style, the way one is used to from Python. In addition to the WSGI framework, this allows the use of several template libraries, such as Django. Since the model is CGI 1.1, there are no or very little restrictions on what can be returned - you can return, for instance, XML or JSON and you have full control over the Content-Type: returned. The execution model is request-based. If a client request arrives, GAE will start a new instance (or reuse an existing instance if possible), then invoke the main() method. At this point, you have a set limit to process this request (though not explicitly stated in Google's doc, the limit appears to be currently 9 seconds) and return a result to the client. Note that this per-request limit is a maximum; you should usually be much quicker in your response. Also note that any CPU cycles you use during those 9 seconds (but not time you spent wait fetching results from other application tiers) count against your overall CPU budget. The key service the GAE runtime libraries provide is the Google datastore, aka BigTable [2]. You can think of this service as a highly efficient, persistent store for structured data. You may think of it as a simplified database that allows the creation, retrieval, updating, and deletion (CRUD) of entries using keys and, optionally, indices. It provides limited support transactions as well. Though it is less powerful than conventional relational databases - which aren't nearly as scalable - it can be accessed using GQL, a query language that's similar in spirit to SQL. Notably, GQL (or BigTable) does not support JOINs, which means that you will have to adjust your traditional approach to database normalization. The Python binding for the structured data is intuitive and seamless. You simply declare a Python class for the properties of objects you wish to store, along with the types of the properties you wish included, and you can subsequently use a put() or delete() method to write and delete. Queries will return instances of the objects you placed in a given table. Tables are named using the Python classes. Google provides a number of additional runtime libraries, such as for simple Image processing a la Google Picasa, for the sending of email (subject to resource limits), and for user authentication, solely using Google
Re: [CODE4LIB] anyone know about Inera?
Min, Eric, and others working in this domain - have you considered designing your software as a scalable web service from the get-go, using such frameworks as Google App Engine? You may be able to use Montepython for the CRF computations (http://montepython.sourceforge.net/) I know Min offers a WSDL wrapper around their software, but that's simply a gateway to one single-machine installation, and it's not intended as a production service at that. - Godmar On Sat, Jul 12, 2008 at 3:20 AM, Min-Yen Kan [EMAIL PROTECTED] wrote: Hi Steve, all: I'm the key developer of ParsCit. I'm glad to hear your feedback about what doesn't work with ParsCit. Erik is correct in saying that we have only trained the system for what data we have correct answers for, namely computer science. As such it doesn't perform well with other data (especially health sciences citations, which we have also done some pilot tests on. I note that there are other citation parsers out there, include Erik's own HMM parser (I think Erik mentioned it as well, available from his website here: http://gales.cdlib.org/~egh/hmm-citation-extractor/) Anyways, I've tried your citation too, and got the same results from the demo -- it doesn't handle the authors correctly in this case. I would very much love to have as many example cases of incorrectly parsed citations as the community is willing to share with us so we can improve ParsCit (it's open source so all can benefit from improvements to ParsCit). We are trying to be as proactive as possible about maintaining and improving ParsCit. I know of at least two groups that have said they are willing to contribute more citations (with correct markings) to us so that we can re-train ParsCit, and there is interest in porting it to other languages (i.e. German right now). We would love to get samples of your data too, where the program does go wrong, to help improve our system. And to get feedback of other fields that need to be parsed in as well: ISSN, ISBNs, volume, and issues. We are also looking to make the output of the ParsCit system compatible with EndNote, BibTeX. We actually have an internal project to try to hook up ParsCit to find references on arbitrary web pages (to form something like Zotero that's not site specific and non-template based). If and when this project comes to fruition we'll be announcing it to the list. If anyone has used ParsCit and has feedback on what can be further improved we'd love to hear from you. You are our target audience! Cheers, Min -- Min-Yen KAN (Dr) :: Assistant Professor :: National University of Singapore :: School of Computing, AS6 05-12, Law Link, Singapore 117590 :: 65-6516 1885(DID) :: 65-6779 4580 (Fax) :: [EMAIL PROTECTED] (E) :: www.comp.nus.edu.sg/~kanmy (W) PS: Hi Erik, still planning on studying your HMM package for improving ParsCit ... It's on my agenda. Thanks again. On Sat, Jul 12, 2008 at 5:36 AM, Steve Oberg [EMAIL PROTECTED] wrote: Yeah, I am beginning to wonder, based on these really helpful replies, if I need to scale back to what is doable and reasonable. And reassess ParsCit. Thanks to all for this additional information. Steve On Fri, Jul 11, 2008 at 4:18 PM, Nate Vack [EMAIL PROTECTED] wrote: On Fri, Jul 11, 2008 at 3:57 PM, Steve Oberg [EMAIL PROTECTED] wrote: I fully realize how much of a risk that is in terms of reliability and maintenance. But right now I just want a way to do this in bulk with a high level of accuracy. How bad is it, really, if you get some (5%?) bad requests into your document delivery system? Customers submit poor quality requests by hand with some frequency, last I checked... Especially if you can hack your system to deliver the original citation all the way into your doc delivery system, you may be able to make the case that 'this is a good service to offer; let's just deal with the bad parses manually.' Trying to solve this via pure technology is gonna get into a world of diminishing returns. A surprising number of citations in references sections are wrong. Some correct citations are really hard to parse, even by humans who look at a lot of citations. ParsCit has, in my limited testing, worked as well as anything I've seen (commercial or OSS), and much better than most. My $0.02, -Nate
Re: [CODE4LIB] use of OpenSearch response elements in libraries?
[ this discussion may be a bit too detailed for the general readership of code4lib; readers not interested in the upcoming WC search API may wish to skip... ] Roy, Atom/RSS are simply the container formats used to return multiple items of some kind --- I'm curious about what those items contain. In the example shown in http://worldcat.org/devnet/index.php/SearchAPIDetails#Using_OpenSearch it appears that the items are only preformatted citations, rather than, for instance, MARCXML or DC representation of records. (The SRU interface, on the other hand, appears to return MARCXML/DC.) Is this impression false and does the OpenSearch API in fact return record metadata beyond preformatted citations? (I note that your search syntax for OpenURL does not allow the choice of a recordSchema.) If so, what's the rationale for not supporting the retrieval of record metadata via OpenSearch? - Godmar On Tue, Jun 24, 2008 at 10:17 AM, Roy Tennant [EMAIL PROTECTED] wrote: To be specific, currently supported record formats for an OpenSearch query of the WorldCat API are Atom and RSS as well as the preformatted citation. Roy On 6/23/08 6/23/08 • 10:18 PM, Godmar Back [EMAIL PROTECTED] wrote: Thanks --- let me do some query refinement then -- does anybody know of examples where record metadata (e.g., MARCXML or DC) is returned as an OpenSearch response? [ If I understand the proposed Worldcat API correctly, OpenSearch is used only for pre-formatted citations in HTML. ] - Godmar On Tue, Jun 24, 2008 at 12:54 AM, Roy Tennant [EMAIL PROTECTED] wrote: I believe WorldCat qualifies, although the API is not yet ready for general release (but soon): http://worldcat.org/devnet/index.php/SearchAPIDetails Roy On 6/23/08 6/23/08 € 8:55 PM, Godmar Back [EMAIL PROTECTED] wrote: Hi, are there any examples of functioning OpenSearch interfaces to library catalogs or library information systems? I'm specifically interested in those that not only advertise a text/html interface to their catalog, but who include OpenSearch response elements. One example I've found is Evergreen; though it's not clear to what extent this interface is used or implemented. For instance, their demo installation's OpenSearch description advertises an ATOM feed, but what's returned doesn't validate. (*) Are there other examples deployed (and does anybody know applications that consume OpenSearch feeds?) - Godmar (*) See, for instance: http://demo.gapines.org/opac/extras/opensearch/1.1/PINES/atom-full/keyword/?s e archTerms=musicstartPage=startIndex=count=searchLang which is not a valid ATOM feed: http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fdemo.gapines.org%2Fop a c%2Fextras%2Fopensearch%2F1.1%2FPINES%2Fatom-full%2Fkeyword%2F%3FsearchTerms% 3 Dmusic%26startPage%3D%26startIndex%3D%26count%3D%26searchLang -- --
Re: [CODE4LIB] use of OpenSearch response elements in libraries?
I too find this decision intriguing, and I'm wondering about its wider implications on the use of RSS/Atom as a container format inside and outside the context of OpenSearch as it relates to library systems. I note that an OpenSearch description does not allow you to specify type of the items contained within a RSS or Atom feed being advertised. As such, it's impossible to advertise multiple output formats within a single OpenSearchDescription (specifically, you can only have 1 Url element with 'type=application/rss+xml'). Therefore, clients consuming OpenSearch must be prepared to interpret the record types correctly, but cannot learn from the server a priori what those are. My guess would be that OCLC is shooting for OpenSearch consumers that expect RSS/Atom feeds and that have some generic knowledge on how to process items that contain, for instance, HTML; but at the same time are unprepared to handle MARCXML or other metadata formats. Examples may include Google Reader or the A9 metasearch engine. The alternative, SRU, contains no expectation that items by processed by clients that are unaware of library metadata formats. In addition, its 'explain' verb allows clients to learn which metadata formats they can request. This may be reviving a discussion that an Internet search shows was very active in the community about 4 years ago, although 4 years later, I was unable to find out the outcome of this discussion, so it may be good to capture the current thinking. What client applications currently consume OpenSearch results vs. what client applications consume SRU results? I understand that a number of ILS vendors besides OCLC have already or are in the process of providing web services interfaces to their catalog; do they choose OpenSearch and/or SRU, or a heterogeneous mix in the way OCLC does. If they choose OpenSearch, do they use RSS or ATOM feeds to carry metadata records? - Godmar On Tue, Jun 24, 2008 at 1:23 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: In general, is there a reason to have different metadata formats from SRU vs OpenSearch? Is there a way to just have the same metadata formats available for each? Or are the demands of each too different to just use the same underlying infrastructure, such that it really does take more work to include a metadata format as an OpenSearch option even if it's already been included as an SRU option? Personally, I'd like these alternate access methods to still have the same metadata format options, if possible. And other options. Everything should be as consistent as possible to avoid confusion. Jonathan Washburn,Bruce wrote: Godmar, I'm one of the developers working on the WorldCat API. My take is that the API is evolving and adapting as we learn more about how it's expected to be used. We haven't precluded the addition of more record metadata to OpenSearch responses; we opted not to implement it until we had more evidence of need. As you've noted, WorldCat API OpenSearch responses are currently limited to title and author information plus a formatted bibliographic citation, while more complete record metadata is available in DC or MARC XML in SRU responses. Until now we had not seen a strong push from the API early implementers for more record metadata in OpenSearch responses, based on direct feedback and actual use. I can see how it could be a useful addition, though, so we'll look into it. Bruce -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] use of OpenSearch response elements in libraries?
Thanks --- let me do some query refinement then -- does anybody know of examples where record metadata (e.g., MARCXML or DC) is returned as an OpenSearch response? [ If I understand the proposed Worldcat API correctly, OpenSearch is used only for pre-formatted citations in HTML. ] - Godmar On Tue, Jun 24, 2008 at 12:54 AM, Roy Tennant [EMAIL PROTECTED] wrote: I believe WorldCat qualifies, although the API is not yet ready for general release (but soon): http://worldcat.org/devnet/index.php/SearchAPIDetails Roy On 6/23/08 6/23/08 € 8:55 PM, Godmar Back [EMAIL PROTECTED] wrote: Hi, are there any examples of functioning OpenSearch interfaces to library catalogs or library information systems? I'm specifically interested in those that not only advertise a text/html interface to their catalog, but who include OpenSearch response elements. One example I've found is Evergreen; though it's not clear to what extent this interface is used or implemented. For instance, their demo installation's OpenSearch description advertises an ATOM feed, but what's returned doesn't validate. (*) Are there other examples deployed (and does anybody know applications that consume OpenSearch feeds?) - Godmar (*) See, for instance: http://demo.gapines.org/opac/extras/opensearch/1.1/PINES/atom-full/keyword/?se archTerms=musicstartPage=startIndex=count=searchLang which is not a valid ATOM feed: http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fdemo.gapines.org%2Fopa c%2Fextras%2Fopensearch%2F1.1%2FPINES%2Fatom-full%2Fkeyword%2F%3FsearchTerms%3 Dmusic%26startPage%3D%26startIndex%3D%26count%3D%26searchLang --
Re: [CODE4LIB] Open Source Repositories
Generally, you won't find a credible site that would allow you to upload unvetted binaries of adapted versions of low-volume software. The obvious risks are just too high. My recommendation would be a personal webpage, hosted on a site that's associated with a real-world institution, and a real-world contact. - Godmar On Fri, May 16, 2008 at 10:24 AM, Carol Bean [EMAIL PROTECTED] wrote: I probably should clarify that the friend is looking for a place to share what she's already fixed and compiled to run on a low resource machine (both in Windows and Linux) Thanks, Carol On Fri, May 16, 2008 at 9:52 AM, MJ Ray [EMAIL PROTECTED] wrote: Carol Bean [EMAIL PROTECTED] wrote: Done anyone know of open source repositories that have precompiled software? (Especially low resource software) As well as their own, most of the free software operating systems have third-party repositories, such as those listed at http://www.apt-get.org/ for debian. Make sure you trust the third party provider, though! Regards, -- MJ Ray (slef) Webmaster for hire, statistician and online shop builder for a small worker cooperative http://www.ttllp.co.uk/ http://mjr.towers.org.uk/ (Notice http://mjr.towers.org.uk/email.html) tel:+44-844-4437-237 -- Carol Bean [EMAIL PROTECTED]
Re: [CODE4LIB] google books and OCLC numbers
Mark, I'll answer this one on list, but let's take discussion that is specifically related to GBS classes off-list since you're asking questions about this particular software --- I had sent the first email to Code4Lib because I felt that our method of integrating the Google Book viewability API into III Millennium in a clean way was worth sharing with the community. On Thu, May 8, 2008 at 10:07 AM, Custer, Mark [EMAIL PROTECTED] wrote: Slide 4 in that PowerPoint mentions something about a small set of Google Book Search information, but is also says that the items are indexed by ISBN, OCLC#, and LCCN. And yet, during the admittedly brief time that I tried out this really nice demo, I was unable to find any links to books that were available in full view, which made me wonder if any of the search results were searching GBS with their respective OCLC #s (and not just ISBNs, if available). GBS searches by whatever you tell it: ISBN, OCLC, *OR* LCCN. Not all of them. For example, if I use the demo site that's provided and search for mark twain and limit my results to publication dates of, say, 1860-1910, I don't receive a single GBS link. So I checked to see if Eve's Diary was in GBS and, of course, it was... and then I made sure that the copy I found in the demo had the same OCLC# as the one in GBS; and it was. So, is this a feature that will be added later, or is it just that the entire set of bib records available at the demo site are not included in the GBS aspect of the demo? By demo site provided, do you mean addison.vt.edu:2082? Remember that in this demo, the link is only displayed if Google has a partial view, and *not* if Google has full text or no view. It's my understanding that Twain's books are past copyright, so Google has fully scanned them and they are available as full text. If you take that into account, Eve's Diary (OCLC# 01052228) works fine. I added it at the bottom of http://libx.org/gbs/tests.html To search for this book by OCLC, you'd use this span: span title=OCLC:01052228 class=gbs-thumbnail gbs-link-to-preview gbs-if-partial-or-fullEve's Diary/span which links to the full text version. Note that --- interestingly --- Google does not appear to have a thumbnail for this book's cover. Secondly, I have another question which I hope that someone can clear up for me. Again, I'll use this copy of Eve's Diary as an example, which has an OCLC number of 01052228. Now, if you search worldcat.org (using the advance search, basic search, of even adding things like oclc: before the number), the only way that I can access this item is to search for 1052228 (removing the leading zero). And this is exactly how the OCLC number displays in the metadata record, directly below the field that states that there are 18 editions of this work. All of that said, I can still access the book with either of these URLs: http://worldcat.org/wcpa/oclc/1052228 http://worldcat.org/wcpa/oclc/01052228 Now, I could've sworn that GBS followed a similar route, and so, I previously searched it for OCLC numbers by removing any leading zeroes. As of at least today, though, the only way for me to access this book via GBS is to use the OCLC number as it appears in the MARC record... that is, by searching for oclc01052228. Has anyone else noticed this change in GBS (though it's quite possible that I'm simply mistaken)? And could anyone inform me about the technical details of any of these issues? I mean, I get that worldcat has to also deal with ISSNs, but is there a way to use the search box to explicitly declare what type of number the query is... and why would the value need to have the any leading 0's removed in the metadata display (especially since the URL method can access either)? That's a question about the search interface accessed at books.google.com, not about the book viewability API. Those are two different services. The viewability API advertises that it supports OCLC: and LCCN: prefixes to search for OCLC and LCCN, respectively, in addition to ISBNs, and that works in your example, for instance, visit: http://books.google.com/books?jscmd=viewapibibkeys=OCLC:01052228callback=X or http://books.google.com/books?jscmd=viewapibibkeys=OCLC:1052228callback=X The books.google.com search interface doesn't advertise the ability to search by OCLC number --- the only reason you are successful with searching for OCLC01052228 is because this string happens to occur somewhere in this book's metadata description, and Google has the full content of the metadata descriptions indexed like it indexed webpages. Take also a look at the advanced search interface at: http://books.google.com/advanced_book_search You'll find no support for OCLC or LCCN. It does show, however, than isbn: can be used to search for ISBNs, in the style prefixes can be used in other search interfaces. - Godmar
[CODE4LIB] google books for III millennium
Hi, here's a pointer to follow up on the earlier discussion on how to integrate Google books viewability API into closed legacy systems that allow only limited control regarding what is being output, such as III's Millennium system. Compared to other solutions, no JavaScript programming is required, and the integration into the vendor-provided templates (such as briefcit.html etc.) is reasonably clean, provides targeted placement, and allows for multiple uses per page. Slides (excerpted from Annette Bailey's presentation at IUG 2008): http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt A demo is currently available here: http://addison.vt.edu:2082/ - Godmar
[CODE4LIB] coverage of google book viewability API
Hi, to examine the usability of Google's book viewability API when lookup is done via ISBN, we did some experiments, the results of which I'd like to share. [1] For 1000 randomly drawn ISBN from 3,192,809 ISBN extracted from a snapshot of LoC's records [2], Google Books returned results for 852 ISBN. We then downloaded the page that was referred to in the info_url parameter of the response (which is the About page Google provides) for each result. To examine whether Google retrieved the correct book, we checked if the Info page contained the ISBN for which we'd searched. 815 out of 852 contained the same ISBN. 37 results referred to a different ISBN than the one searched for. We examined the 37 results manually: 33 referred to a different edition of the book whose ISBN was used to search, as judged by comparing author/title information with OCLC's xISBN service. (We compared the author/title returned by xISBN with the author/title listed on Google's book information page.) 4 records appeared to be misindexed. I found the results (85.2% recall and 99% precision, if you allow for the ISBN substitution; with a 3.1% margin of error) surprisingly high. - Godmar [1] http://top.cs.vt.edu/~gback/gbs-accuracy-study/ [2] http://www.archive.org/details/marc_records_scriblio_net
Re: [CODE4LIB] google books for III millennium
Kent, the link you provide is for the Google API --- however, I was referring to the Google Book Viewability API. They're unrelated, to my knowledge. My experience with the Google Book Viewability API is that it can be invoked server-side (Google's terms notwithstanding), but requires a user-agent that mimics an existing browser. A user agent such as the one provided by Sun's JDK (I think it's jdk-1.6 or some such) will be rejected; a referrer URL, on the other hand, does not appear to be required). - Godmar On Tue, May 6, 2008 at 6:32 PM, Kent Fitch [EMAIL PROTECTED] wrote: Hi Jonathan, The Google API can now be invoked guilt-free from server-side, see: http://code.google.com/apis/ajaxsearch/documentation/#fonje For Flash developers, and those developers that have a need to access the AJAX Search API from other Non-Javascript environments, the API exposes a simple RESTful interface. In all cases, the method supported is GET and the response format is a JSON encoded result set with embedded status codes. Applications that use this interface must abide by all existing terms of use. An area to pay special attention to relates to correctly identifying yourself in your requests. Applications MUST always include a valid and accurate http referer header in their requests. In addition, we ask, but do not require, that each request contains a valid API Key. By providing a key, your application provides us with a secondary identification mechanism that is useful should we need to contact you in order to correct any problems. Well, guilt-free if you agree to the terms, which include: The API may be used only for services that are accessible to your end users without charge. You agree that you will not, and you will not permit your users or other third parties to: (a) modify or replace the text, images, or other content of the Google Search Results, including by (i) changing the order in which the Google Search Results appear, (ii) intermixing Search Results from sources other than Google, or (iii) intermixing other content such that it appears to be part of the Google Search Results; or (b) modify, replace or otherwise disable the functioning of links to Google or third party websites provided in the Google Search Results. Regards, Kent Fitch On Wed, May 7, 2008 at 7:53 AM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is interesting. These slides don't give me quite enough info to figure out what's going on (I hate reading slides by themselves!), but I'm curious about this statement: Without JavaScript coding (even though Google's API requires JavaScript coding as it is) . Are you making calls server-side, or are you still making them client-side? As you may recall, one issue I keep beating upon is the desire to call Google's API server-side. While it's technically possible to call it server-side, Google doesn't want you to. I wonder if this is what they're doing there? The problems with that are: 1) It may violate Googles terms of service 2) It may run up against Google traffic-limiting defenses 3) [Google's given reason]: It doesn't allow Google to tailor the results to the end-users location (determined by IP). Including an x-forwarded-for header _may_ get around #2 or #3. Including an x-forwarded-for header should probably be considered a best practice when doing this sort of thing server-side in general, but I'm still nervous about doing this, and wish that Google would just plain say they allow server-side calls. Godmar Back wrote: Hi, here's a pointer to follow up on the earlier discussion on how to integrate Google books viewability API into closed legacy systems that allow only limited control regarding what is being output, such as III's Millennium system. Compared to other solutions, no JavaScript programming is required, and the integration into the vendor-provided templates (such as briefcit.html etc.) is reasonably clean, provides targeted placement, and allows for multiple uses per page. Slides (excerpted from Annette Bailey's presentation at IUG 2008): http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt A demo is currently available here: http://addison.vt.edu:2082/ - Godmar -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] google books for III millennium
The solution is entirely client-side; as it has to be for this particular kind of legacy system. (In some so-called turn-key versions, this particular company does not even provide access to the server's file system, let alone the option of running any services.) We had already discussed how it works (check the threads from March); this particular pointer was simply a pointer about how to integrate it into this particular system (since there were doubts back then about how hard or easy such integration is.) - Godmar On Tue, May 6, 2008 at 5:53 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is interesting. These slides don't give me quite enough info to figure out what's going on (I hate reading slides by themselves!), but I'm curious about this statement: Without JavaScript coding (even though Google's API requires JavaScript coding as it is) . Are you making calls server-side, or are you still making them client-side? As you may recall, one issue I keep beating upon is the desire to call Google's API server-side. While it's technically possible to call it server-side, Google doesn't want you to. I wonder if this is what they're doing there? The problems with that are: 1) It may violate Googles terms of service 2) It may run up against Google traffic-limiting defenses 3) [Google's given reason]: It doesn't allow Google to tailor the results to the end-users location (determined by IP). Including an x-forwarded-for header _may_ get around #2 or #3. Including an x-forwarded-for header should probably be considered a best practice when doing this sort of thing server-side in general, but I'm still nervous about doing this, and wish that Google would just plain say they allow server-side calls. Godmar Back wrote: Hi, here's a pointer to follow up on the earlier discussion on how to integrate Google books viewability API into closed legacy systems that allow only limited control regarding what is being output, such as III's Millennium system. Compared to other solutions, no JavaScript programming is required, and the integration into the vendor-provided templates (such as briefcit.html etc.) is reasonably clean, provides targeted placement, and allows for multiple uses per page. Slides (excerpted from Annette Bailey's presentation at IUG 2008): http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt A demo is currently available here: http://addison.vt.edu:2082/ - Godmar -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] coverage of google book viewability API
On Tue, May 6, 2008 at 11:02 PM, Michelle Watson [EMAIL PROTECTED] wrote: Is there something in the code that prevents the link from being offered unless it goes to at least a partial preview (which I take to mean scanned pages), or have I just been lucky in my searching? I can't comment on whether or not the 'no preview' is useful because every book I see has some scanned content. Yes, in Annette's example, the link is only offered if Google has preview pages in addition to the book information. See the docs on libx.org/gbs for further detail (look for gbs-if-partial ) I had the same subjective impression in that I was surprised by how many books have previews - for instance, if I search for genomics on addison.vt.edu:2082, 24 of the first 50 hits returned have partial previews. Incidentally, 2 out of the 24 lead to the wrong book. This is why I sampled the LoC's ISBN set. It's likely that there's observer bias (such as trying genomics), and it's also possible that Google is more likely to have previews for books libraries tend to hold, such as popular or recent books. (I note that most of the 24 hits for genomics that have previews are less than 4 years old.) Conversely, for those recent years, precision may be lower, with more books misindexed. - Godmar
[CODE4LIB] how to obtain a sampling of ISBNs
Hi, for an investigation/study, I'm looking to obtain a representative sample set (say a few hundreds) of ISBNs. For instance, the sample could represent LoC's holdings (or some other acceptable/meaningful population in the library world). Does anybody have any pointers/ideas on how I might go about this? Thanks! - Godmar
Re: [CODE4LIB] how to obtain a sampling of ISBNs
Hi, thanks to everybody who's replied with offers to provide ISBNs. I need to clarify that I'm looking for a sample of ISBNs that is representative of some larger population, such as all books cataloged by LoC, or all books in library X's catalog, or all books sold by Amazon. It could be, for instance, a simple random sample [1]. What will not work are ISBNs coming from a FRBR service, from a specialized collections, or the first n ISBNs coming from a catalog dump (unless that order in which the catalog database is dumped is explicitly random). - Godmar [1] http://en.wikipedia.org/wiki/Simple_random_sample On Mon, Apr 28, 2008 at 10:40 AM, Shanley-Roberts, Ross A. Mr. [EMAIL PROTECTED] wrote: I could give you any number of sets of isbns. What kind of material are you interested in: videos, books, poetry, electronic resources, etc., or I could supply a set of isbns for any subject area or LC classification area that you might be interested in. Ross Ross Shanley-Roberts Special Projects Technologist Miami University Libraries Oxford, OH 45056 [EMAIL PROTECTED] 847 672-9609 847 894-3911 cell -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Godmar Back Sent: Monday, April 28, 2008 8:35 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] how to obtain a sampling of ISBNs Hi, for an investigation/study, I'm looking to obtain a representative sample set (say a few hundreds) of ISBNs. For instance, the sample could represent LoC's holdings (or some other acceptable/meaningful population in the library world). Does anybody have any pointers/ideas on how I might go about this? Thanks! - Godmar
Re: [CODE4LIB] Serials Solutions 360 API - PHP classes?
Could you share, briefly, what this API actually does (if doing so doesn't violate your NDA?) - Godmar On Thu, Apr 3, 2008 at 1:40 PM, Yitzchak Schaffer [EMAIL PROTECTED] wrote: From: Code for Libraries on behalf of Yitzchak Schaffer Sent: Wed 4/2/2008 12:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Serials Solutions 360 API - PHP classes? Does anyone have/know of PHP classes for searching the Serials Solutions 360 APIs, particularly Search? Okay, having not heard any affirmatives, I'm starting work on this. I'm an OOP and PHP noob, so I'm donning my flak jacket/dunce cap in advance, but I'll try to make this as useful to the community and comprehensive as time and my ability allow. Assuming that Serials Solutions will allow some kind of sharing for these - they make clients sign a NDA before they show you the docs. I'm waiting to hear their response; I would be surprised if they wouldn't allow sharing of something like this among clients. -- Yitzchak Schaffer Systems Librarian Touro College Libraries 33 West 23rd Street New York, NY 10010 Tel (212) 463-0400 x230 Fax (212) 627-3197 [EMAIL PROTECTED]
Re: [CODE4LIB] Google Book Search API - JavaScript Query
On Thu, Mar 20, 2008 at 12:44 PM, KREYCHE, MICHAEL [EMAIL PROTECTED] wrote: -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Godmar Back Sent: Thursday, March 20, 2008 10:45 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Book Search API - JavaScript Query Have you tried placing your code in an window.onload handler? Read the example I created at libx.org/gbs and if that works for you in IE6, use the technique there. (Or you may just use the entire script - it seems you're reimplementing a lot of it anyway.) I'll have to study that a bit and see how it works. I was aiming for a solution with a minimal amount of code, but perhaps a more robust approach like yours is in order. I looked into this issue some more and like to share a bit of what I learned. The short answer is: use jQuery (or a library like it.) The longer answer is that window.onload - even the browser-compatible version using the addEvent function described here: http://www.dustindiaz.com/rock-solid-addevent/ ) won't fire until all images on a page have loaded, which can incur significant latency - especially if there's a large number of embedded objects from different origins on the page, some of which may be stragglers when loading. Instead, what you want is a browser-compatible notification when the HTML content has been downloaded and parsed into the DOM tree structure, that is, when the document is ready and it's safe to manipulate it using such methods as getElementById(). Implementing this notification requires a variety of browser-specific hacks (google for details, or examine 'bindReady' in jQuery for a distilled summary of the collective experience.) jQuery implements those hacks and hides them from you, so in jQuery, it's as simple as saying $(function () { insert work here }); JQuery will determine when the document is ready and execute your anonymous function then, which is at the earliest possible time. If no hack is known for a particular platform, jQuery falls back to the load handler. If you think about it, that's not something you want to implement or even think about. (Note that jQuery's init constructor, which is what the $ symbol is bound to, heavily adjusts to the type of its first argument. If the argument is a function, it means to call the function when the document is ready. An alternate syntax is $().ready(function ...) - which relies on jQuery substituting 'document' if the first argument is not given. The most readable syntax may be: $(document).ready(function ...) though $(function ...) may make for a good idiom.) - Godmar
Re: [CODE4LIB] Google Book Search API - JavaScript Query
I didn't mean window.onload literally; use a browser-compatible version of it [jQuery, btw, would figure that out automatically for you, so if you can integrate jQuery in your page, you may want to try Matt's plugin.] My prototype uses a function called addEvent from Dustin Diaz, see http://www.dustindiaz.com/rock-solid-addevent I think it uses 'attachEvent' in IE6, which appears to work. I'm also using it in Majax (libx.org/majax) and it works there as well in IE6. - Godmar On Thu, Mar 20, 2008 at 11:22 AM, David Kane [EMAIL PROTECTED] wrote: Hi Godmar, Thanks. Yes. I tried that, but the support for window.onload does not exist in IE6. I also tried the defer=defer attribute in the script tag, which did not work either. Tim's solution looks good. I have yet to try it though. ( will wait until after Easter). Cheers, David On 20/03/2008, Godmar Back [EMAIL PROTECTED] wrote: Have you tried placing your code in an window.onload handler? Read the example I created at libx.org/gbs and if that works for you in IE6, use the technique there. (Or you may just use the entire script - it seems you're reimplementing a lot of it anyway.) - Godmar On Thu, Mar 20, 2008 at 9:09 AM, KREYCHE, MICHAEL [EMAIL PROTECTED] wrote: Tim and David, Thanks for sharing you solutions; the IE problem has been driving me crazy. I've mostly been working on the title browse page of our catalog. Originally I had it working on Firefox, Safari, and IE7 (IE6 worked if I refreshed the page); after some rearrangement of the script, it's now working on IE6 but broken on Safari. This is still proof of concept code and is only on our staging server (http://kentlink.kent.edu:2082/). Try a keyword search and you should see some Google links. Mike -- Michael Kreyche Systems Librarian / Associate Professor Libraries and Media Services Kent State University 330-672-1918 -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Tim Hodson Sent: Thursday, March 20, 2008 7:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Book Search API - JavaScript Query One way I have used to resolve this is to poll the object until it exisits before continuing. function myInit(id){ 13 // if Obj is not defined yet, call this function again until it is. 14 15 if (typeof myObj == undefined){ 16 createScript(); 17 setTimeout(myInit(), 60); 18 return; 19 } 20 // do stuff onlu if myObj is now an object 21 else if (typeof myObj == object){ 22 23 myGo(); 24 return; 25 } HTH Tim 26} On 20/03/2008, David Kane [EMAIL PROTECTED] wrote: HI Folks, We were one of the first libraries to get the GBS API working on our OPAC. Like many OPACs, ours is difficult to modify at times and requires a dynamic insert of a generated (by PHP) JavaScript, which is hosted on a separate server to the OPAC pages. It seems to work fine on most browsers, giving an appropriate link to a full/partial text preview of that work on GBS. I run into a problem with IE6, which means that the function defined in the JavaScript aren't available by the time the script is called at the bottom of the page. You should be able to see a GBS link on most pages, here is an example: http://witcat.wit.ie/search/i?SEARCH=0192833987 The attached image shows you what you should see. If anyone can shed any light on this, it would be appreciated. Thanks and best regards, David Kane Systems Librarian Waterford Institute of Technology Ireland -- Tim Hodson www.informationtakesover.co.uk www.timhodson.co.uk
Re: [CODE4LIB] Free covers from Google
FWIW, realize that this is client-side mashup. Google will see individual requests from individual IP addresses from everybody viewing your page. For each IP address from which it sees requests it'll decide whether to block or not. It'll block if it thinks you're harvesting their data. Wageningen University owns the 137.224/16 network, so I find it doubtful that you're all sharing the same IP address. It's probably just your desktop IP address (or, if you're behind a NAT device, the address used by that device - but that's probably only a small group of computers.) That makes it even more concerning that Google's defenses could be triggered by your development and testing activities. Do complain about it to them. (I doubt they change their logic, but you can try.) I've received the CAPTCHA from Google in the past a few times if I use it as a calculator. Enter more than a dozen or so expressions, and it thinks I'm a computer who needs help from Google to compute simple things such as english-to-metric conversions. I think that's a huge drawback, actually. How does Amazon's image service work? Does it suffer from the same issue? - Godmar On Mon, Mar 17, 2008 at 4:50 AM, Boheemen, Peter van [EMAIL PROTECTED] wrote: As i wrote earlier, I have implemented a link using the Google API in our library catalog. It worked . for a while :) What we notice now is, that Google responds with an error message. It thinks that it has detected spyware or some virus. i see the same effect now when I click on the examples Godmar and Tim created. When I go to Google books directly with my browser now, I get the same message and get the request to enter a non machine readable string and then I can go on. My API calls however, still fail. This has probably got to do with the fact that anybody who is accessing Google from the university campus exposes the same IP adress to Google. This is probably a trigger for Google to respond with this error. Does anybody have any ideas about what to do about this, before I try to get in touch with Google? Peter van Boheemen Wageningen University and Research Library The Netherlands
Re: [CODE4LIB] Free covers from Google
Although I completely agree that server-side queryability is something we should ask from Google, I'd like to follow up on: On Mon, Mar 17, 2008 at 11:06 AM, Jonathan Rochkind [EMAIL PROTECTED] wrote: The architecture of SFX would make it hard to implement Google Books API access as purely client javascript, without losing full integration with SFX on par with other 'services' used by SFX. We will see what happens. Could you elaborate? Do you mean 'hard' or 'impossible'? Meanwhile, I've extended the google book classes (libx.org/gbs ) to provide more flexibility; it now supports these classes: gbs-thumbnail Include an img... embedding the thumbnail image gbs-link-to-preview Wrap span in link to preview at GBS gbs-link-to-info Wrap span in link to info page at GBS gbs-link-to-thumbnail Wrap span in link to thumbnail at GBS gbs-if-noview Keep this span only if GBS reports that book's viewability is 'noview' gbs-if-partial-or-full Keep this span only if GBS reports that book's viewability is at least 'partial' gbs-if-partial Keep this span only if GBS reports that book's viewability is 'partial' gbs-if-full Keep this span only if GBS reports that book's viewability is 'full' gbs-remove-on-failure Remove this span if GBS doesn't return bookInfo for this item - Godmar
Re: [CODE4LIB] Free covers from Google
On Mon, Mar 17, 2008 at 11:13 AM, Tim Spalding [EMAIL PROTECTED] wrote: limits. I don't think it's a strict hits-per-day, I think it's heuristic software meant to stop exactly what we'd be trying to do, server-side machine-based access. Aren't we still talking about covers? I see *no* reason to go server-side on that. Browser-side gets you what you want—covers from Google—without the risk they'll shut you down over overuse. But Peter's experience says otherwise, no? His computer was shut down during development - I don't see how Google would tell his use from the use of someone doing research using a library catalog. Especially if NAT is used with a substantial number of users as in Giles's use case. - Godmar
Re: [CODE4LIB] jquery plugin to grab book covers from Google and link to Google books
Good, but why limit it to 1 class per span? My proposal separates different functionality in multiple classes, allowing the user to mix and match. If you limit yourself to 1 class, you have to provide classes for all possible combinations a user might want, such as: gbsv-link-to-preview-with-thumbnail. - Godmar On Mon, Mar 17, 2008 at 4:30 PM, Bess Sadler [EMAIL PROTECTED] wrote: Matt Mitchell here at UVa just wrote a jquery plugin to access google book covers and link to google books. I wrote up how to use it here: http://www.ibiblio.org/bess/?p=107 We're using it as part of Blacklight, and we're making it available through the Blacklight source code repository under an Apache 2.0 license. First, grab the plugin here: http://blacklight.rubyforge.org/svn/ javascript/gbsv-jquery.js, and download jquery here: http:// code.google.com/p/jqueryjs/downloads/detail?name=jquery-1.2.3.min.js. Now make yourself some HTML that looks like this: html head script type=text/javascript src=jquery-1.2.3.min.js/script script type=text/javascript src=gbsv-jquery.js/ script script type=text/javascript $(function(){ $.GBSV.init(); }); /script /head body span title=ISBN:0743226720 $B!m (B class=gbsv-link-to- preview/span span title=ISBN:0743226720 $B!m (B class=gbsv-link-to- info/span span title=ISBN:0743226720 $B!m (B class=gbsv- thumbnail/span span title=ISBN:0743226720 $B!m (B class=gbsv-link-to- preview-with-thumbnail/span /body /html Now load your page and you should see something like this: http:// blacklight.rubyforge.org/gbsv.html If you link to a non-existent ISBN it will be silently ignored. Give it a shot and give us some feedback! Bess Elizabeth (Bess) Sadler Research and Development Librarian Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED] (434) 243-2305
Re: [CODE4LIB] many processes, one resultCode for Libraries [EMAIL PROTECTED]
If you're doing this in Java, use the java.util.concurrent package and its Executor and Future framework, instead of using Thread.start/join, synchronized etc. directly. Get the book Concurrent Programming in Java: Design Principles and Patterns (ISBN 0-201-31009-0) written by the master himself (Doug Lea; see http://gee.cs.oswego.edu/dl/cpj/ ) - Godmar On Feb 18, 2008 2:19 PM, Durbin, Michael R [EMAIL PROTECTED] wrote: This can be done in Java, but like everything in Java the solution is kind of lengthy and perhaps requires several classes. I've attached a simple skeleton program that spawns threads to search but then processes only those results returned in the first 10 seconds. The code for performing the searches is obviously missing as is the consolidation code, but the concurrency issue is addressed. In this example the search threads aren't killed, but instead left running to finish naturally though their results would be ignored if they weren't done in 10 seconds. It might be better to kill them depending on the circumstances. -Mike -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Eric Lease Morgan Sent: Monday, February 18, 2008 1:43 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] many processes, one resultCode for Libraries [EMAIL PROTECTED] How do I write a computer program that spawns many processes but returns one result? I suppose the classic example of my query is the federated search. Get user input. Send it to many remote indexes. Wait. Combine results. Return. In this scenario when one of the remote indexes is slow things grind to a halt. I have a more modern example. Suppose I want to take advantage of many Web Services. One might be spell checker. Another might be a thesaurus. Another might be an index. Another might be a user lookup function. Given this environment, where each Web Service will return different sets of streams, how do I query each of them simultaneously and then aggregate the result? I don't want to so this sequentially. I want to fork them all at once and wait for their return before a specific time out. In Perl I can use the system command to fork a process, but I must wait for it to return. There is another Perl command allowing me to fork a process and keep going but I don't remember what it is. Neither one of these solutions seem feasible. Is the idea of threading in Java suppose to be able to address this problem? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
Re: [CODE4LIB] xml java package
On Jan 27, 2008 5:40 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote: What is the most respected (useful, understandable) XML Java package? In a few fits of creative rage, I have managed to write my first Java programs. I can now index plain text files with Lucene and search the index. I can parse MARC files with MARC4J, index them with Lucene, and search the index. I can dump the results of the OAI-PMH ListRecords and Identify verbs using harvest2 from OCLC. I now need to read XML. Unlike indexing and doing OAI-PMH, there are a myriad of tools for reading and writing XML. I've done SAX before. I think I've done a bit of DOM. If I wanted a straight-forward and well-supported Java package that supported these APIs, then what package might I use? If the data you're manipulating is partially or fully described by a Schema or DTD, consider using a package such as Castor (castor.org) that generates classes that stores you XML data as Java beans. In this case, you get XML parsing, XML generation, and even validation for free, that is, using only about 3 lines of code. If you don't have a Schema, considering creating one or asking the data provider for one - compared to using SAX or compared to using a DOM-like API, the gain in productivity and robustness is significant. We're using Castor extensively in the LibX edition builder - for our own configuration data, which is stored as XML, but also for accessing a number of OCLC services, including the OpenURL registry (which has a complete Schema!), the Worldcat registry (partial schema for SRW), and the OCLC institution profiles (no Schema :-(, so slightly more awkward.) - Godmar
Re: [CODE4LIB] xml java package
I haven't used Castor for mixed content, but obviously, mixed content is more difficult to map to Java types, even if you have a schema. I probably wouldn't use Castor in those situations. Otherwise, it - or a tool like it that can map schemata to Java types for automatic parsing, generation, and validation - should still be your first choice. - Godmar On Feb 1, 2008 11:22 AM, Clay Redding [EMAIL PROTECTED] wrote: I don't know if it's still the case, but I know a recent EAD project that tried to use Castor said that it had problems with mixed content models. -- Clay On Feb 1, 2008, at 10:50 AM, Riley, Jenn wrote: -Original Message- I now need to read XML. Unlike indexing and doing OAI-PMH, there are a myriad of tools for reading and writing XML. I've done SAX before. I think I've done a bit of DOM. If I wanted a straight-forward and well-supported Java package that supported these APIs, then what package might I use? If the data you're manipulating is partially or fully described by a Schema or DTD, consider using a package such as Castor (castor.org) I think I recall hearing in the past that Castor had trouble with XML files that used mixed content models (a set into which TEI and EAD both fall) - can anyone confirm if that's currently the case (or that it never was and I'm completely misremembering)? Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
Re: [CODE4LIB] arg! classpaths! [resolved]
To add a bit of experience gained from 13 years of Java programming: I strongly recommend against setting CLASSPATH in the shell. Instead, use either the -cp switch to java, as in java -cp lucene-core...jar:lucene-demo-.jar or use the env command in Unix, as in env CLASSPATH=/home/eric/lucene/lucene-core-2.3.0.jar:/home/eric/lucence/lucene-demos-2.3.0.jar java These options achieve the same effect, but unlike export, they will not change the CLASSPATH environment variable for the remainder of your shell session. For instance, this command: export CLASSPATH=/home/eric/lucene/lucene-core-2.3.0.jar:/home/eric/lucence/lucene-demos-2.3.0.jar will make it impossible to execute javac or java for .class files in the current directory (because you've excluded . from the classpath, which by default is included.) Note, however, that this rule does not apply to shell scripts: inside shell scripts, it's okay to export CLASSPATH because such settings will be valid only for the shell executing the script; in Unix, changes to environment variable will not reflect back to the shell from the shell script was started. - Godmar You use a plain directory as a CLASSPATH component only if you intend to use .class files that has not been packaged up in a JAR. Thank you for the prompt replies. Yes, my CLASSPATH needed to be more specific; it needed to specify the .jar files explicitly. I can now run the demo. (Arg! Classpaths!) -- ELM
Re: [CODE4LIB] arg! classpaths! [resolved]
On Jan 26, 2008 10:12 AM, Godmar Back [EMAIL PROTECTED] wrote: Note, however, that this rule does not apply to shell scripts: inside shell scripts, it's okay to export CLASSPATH because such settings will be valid only for the shell executing the script; in Unix, changes to environment variable will not reflect back to the shell from the shell script was started. Oops, should read: ... changes to environment variables will not reflect back to the shell from *which* the shell script was started. I should also mention that if you place an export CLASSPATH command in your ~/.bash_profile or ~/.bashrc, you've committed the same mistake because the setting then will be valid for your initial shell session (or every new session, or both, depending on the content of your ~/.bash_profile.) So ignore any instructions that propose you do that. - Godmar