Re: [CODE4LIB] COinS

2012-11-20 Thread Godmar Back
Could you elaborate on your belief that COinS is "actually illegal in
HTML5?" Why would that be so?

 - Godmar



On Tue, Nov 20, 2012 at 5:20 PM, Jonathan Rochkind  wrote:

> It _IS_ an old unused metadata format that should be replaced by something
> else (among other reasons because it's actually illegal in HTML5), but I'm
> not sure there is a "something else" with the right balance of flexibility,
> simplicity, and actual adoption by consuming software.
>
> But COinS didn't have a whole lot of adoption by consuming software
> either. Can you say what you think the COinS you've been adding are useful
> for, what they are getting used for? And what sorts of 'citations' youw ere
> adding them for? For my own curiosity, and because it might help answer if
> there's another solution that would still meet those needs.
>
> But if you want to keep using COinS -- creating a COinS generator like
> OCLC's no longer existing one is a pretty easy thing to do, perhaps some
> code4libber reading this will be persuaded to find the time to create one
> for you and others. If you have a server that could host it, you could
> offer that. :)
>
>
>
>
> On 11/20/2012 4:47 PM, Bigwood, David wrote:
>
>> I've used the COinS Generator at OCLC for years. Now it is gone. Any
>> suggestions on how I can get an occasional COinS for use in our
>> bibliography? Do any of the citation managers generate COinS?
>>
>>
>>
>> Or is this just an old unused metadata format that should be replaced by
>> something else?
>>
>>
>>
>> Thanks,
>>
>> Dave Bigwood
>>
>> dbigw...@hou.usra.edu
>>
>> Lunar and Planetary Institute
>>
>>
>>


Re: [CODE4LIB] COinS

2012-11-20 Thread Godmar Back
Funny this topic comes up right now.

A few days ago, Wikipedia (arguably the biggest provider of COiNS) decided
to discontinue it because they've discovered that generating the COinS
using their decrepit infrastructure uses up so much processing power that
attempts to edit pages with lots of citations time out. See [1, 2]. That
said, there is some movement to restore them once they get their act
together and improve their infrastructure. The big irony is that this move
was driven by editors and regular contributors (it doesn't affect anyone
not "signed into" Wikipedia) that is, exactly those users who *ought* to
make the most regular use of COinS to actually retrieve cited material...

Just by coincidence, we finally engaged on a project to better process
COinS. As is, we're just linking to the OpenURL resolver, which is hit and
miss - that said, it's a facility that's used. We're now keeping
statistics, and for just 10 editions we've had over 5,000 clicks in the
last three month alone.  But we have additional options - Link/360 being
one for SS clients, and Summon another. We think we can do a much better
job at resolving COinS with a combination of these services. None of this
depends on the specific COinS format, of course - any suitable microformat
would work, too.

 - Godmar

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=19262
[2] https://en.wikipedia.org/wiki/Template_talk:Citation/core#Disappointed


On Tue, Nov 20, 2012 at 4:47 PM, Bigwood, David wrote:

> I've used the COinS Generator at OCLC for years. Now it is gone. Any
> suggestions on how I can get an occasional COinS for use in our
> bibliography? Do any of the citation managers generate COinS?
>
>
>
> Or is this just an old unused metadata format that should be replaced by
> something else?
>
>
>
> Thanks,
>
> Dave Bigwood
>
> dbigw...@hou.usra.edu
>
> Lunar and Planetary Institute
>


Re: [CODE4LIB] Book metadata source

2012-10-26 Thread Godmar Back
If it's only in the hundreds, why not just look them up in Worldcat via
their basic search API and pull the ISBNs from the xISBN service? That's
quickly scripted.

 - Godmar

On Thu, Oct 25, 2012 at 3:05 PM, Cab Vinton  wrote:

> I have a list of several hundred book titles & corresponding authors,
> comprising our State Library's book group titles, & am looking for
> ways of putting these titles online in a way that would be useful to
> librarians & patrons. Something along the lines of a LibraryThing
> collection or Amazon wishlist.
>
> Without ISBNs, however, the process could be very labor-intensive.
>
> Any suggestions for how we could handle this as part of a batch process?
>
> I realize that different manifestations of the same work will have
> different ISBNs, so we'd be seeking any work in print format, ideally
> the most commonly held.
>
> The only thought I've had is to do a Z39.50 search using the author &
> title Bib-1 attributes EG @and @attr 1=4 mansfield @attr 1=1003
> austen.
>
> Thanks for your thoughts,
>
> Cab Vinton, Director
> Sanbornton Public Library
> Sanbornton, NH
>


Re: [CODE4LIB] Q: "Discovery" products and authentication (esp Summon)

2012-10-24 Thread Godmar Back
On Wed, Oct 24, 2012 at 1:54 PM, Mark Mounts wrote:

> We have Summon at Dartmouth College. Authentication is IP based so with a
> Dartmouth IP address the user will see all our licensed content.
>
> There is also the option to see all the content Summon has beyond what we
> license by selecting the option "Add results beyond your library's
> collection"
>
>
That's according to my understanding not what Jonathan is talking about.

You can select "Add results beyond your library's collection" while being
unauthenticated/off-campus, but this still won't show you the same results.

The results that are never displayed to unauthenticated users are those
Summon republishes from A&I databases.

"Add results beyond your library's collection" just adds (public) results
from the holdings of other libraries; it doesn't add A&I results.

 - Godmar


Re: [CODE4LIB] Q: "Discovery" products and authentication (esp Summon)

2012-10-24 Thread Godmar Back
On Wed, Oct 24, 2012 at 12:16 PM, Jonathan Rochkind wrote:

> Looking at the major 'discovery' products, Summon, Primo, EDS
>
> ...all three will provide some results to un-authenticated users (the
> general public), but have some portions of the corpus that are restricted
> and won't show up in your results unless you have an authenticated user
> affiliated with customer's organization.
>
>
I brought this issue up on the Summon clients mailing list a few weeks ago.

My impression from the resulting reaction was that people do not appear to
be overly concerned about it, because

a) most queries come from on-campus
b) the only results missing are those that come from re-published A&I
databases (which don't allow unauthenticated access), which is a minority
of content when compared to what is indexed by Summon itself
c) there's an option "Use Off Campus Sign In to access full text and more
content" users can use to avoid the problem.

Personally, I think it's little known, and insufficiently presented to the
user ("more content").

The key problem is that as libraries are increasingly offering their
discovery systems as OPAC replacements, users accustomed to the conventions
used in OPAC do not expect this difference in behavior. OPACs generally
show the same results independent of the user's authentication status, and
do not require authentication just to search.

 - Godmar


Re: [CODE4LIB] Q.: software for vendor title list processing

2012-10-17 Thread Godmar Back
Thanks for everyone who replied to my question.

>From a brief examination, if I understand it correctly, KBART and ONIX
create normative standards for how holdings data should be represented,
which vendors (increasingly) follow.

This leads to three follow-up questions.

First, is there software to translate/normalize existing vendor lists from
vendors that have not yet adopted either of these standards into these
formats? I'm thinking of a collection of adapters or converters, perhaps.
Each would likely constitute small effort, but there would be benefits from
sharing development and maintenance.

Second, if holdings lists were provided in, or converted to, for instance
the KBART format, what software understands these formats to further
process them? In other words, is there immediate bang for the buck of
adopting these standards?

Third, unsurprisingly, these efforts arose in the managements of serials
because holdings there change frequently depending on purchase agreements,
etc. It is my understanding that eBooks are now posing similar collection
management challenges. Are there separate normative efforts for eBooks or
is it believed that efforts such as KBART/ONIX can encompass eBooks as well?

 - Godmar


[CODE4LIB] Q.: software for vendor title list processing

2012-10-16 Thread Godmar Back
Hi,

at our library, there's an emerging need to process title lists from
vendors for various purposes, such as checking that the titles purchased
can be discovered via discovery system and/or OPAC. It appears that the
formats in which those lists are provided are non-uniform, as is the
process of obtaining them.

For example, one vendor - let's call them "Expedition Scrolls" - provides
title lists for download to Excel, but which upon closer inspection turn
out to be HTML tables. They are encoded using an odd mixture of CP1250 and
HTML entities. Other vendors use entirely different formats.

My question is whether there are efforts, software, or anything related to
streamlining the acquisition and processing of vendor title lists in
software systems that aid in the collection development and maintenance
process. Any pointers would be appreciated.

 - Godmar


[CODE4LIB] isoncampus service

2012-06-14 Thread Godmar Back
A number of web applications, both client and server-side, could benefit if
it could be easily determined if a user is on or off campus with respect to
accessing resources that use IP-address based authentication.

For instance, a web site could show/hide a button asking the user to "log
in," or a proxied/non-proxied URL could be displayed depending on whether
the user is connecting from within/outside an authorized IP range. This
would reduce or eliminate the need for special proxy setups/unnecessary
proxy use and could improve the user experience.

This is probably a problem for which many ad-hoc solutions exist on
campuses as well as solutions integrated into vendor-provided systems. It
would be nice, and beneficial to in particular LibX, but also presumably
other software that is facing this problem, to have a reusable service
implementation/response format that is easily deployable and requires only
minimum effort for setup and maintenance. Maintenance should be as simple
as maintaining a file with the IP-ranges in a directory, like many
libraries already do for their communication with database vendors or
publishers.

My question is what existing ideas/standards/software exists for this
purpose, if any, or what ideas/approaches others could share.

I would like to point at a small piece of software I'm sharing, which is a
PhP-based isoncampus service [1], a demo is available here [2]. If anyone
has a similar need and is interested in working together on a solution,
this could be a seed around which to start. Besides the easily deployable
PhP implementation, more efficient bindings/implementations for other
languages and/or server/cloud environment could be created (AppEngine comes
to mind.)

 - Godmar

[1] https://github.com/godmar/isoncampus
[2] http://libx.lib.vt.edu/services/isoncampus/isoncampus.php

ps: as a side-note, OCLC's OpenURL registry used to include IP-ranges as
they were known to OCLC; this was at some point removed due to privacy
concerns. I do note, however, that in general the ownership of IP-ranges is
public information, as are CIDR ranges, both of which are easily accessible
via web services provided by arin.net or by the regional registries. Though
mapping from an IP address to its owner is not the same as listing IP
ranges associated with an organization (many include multiple discontiguous
CIDR ranges), I note that some of this information is also public via the
BGP-advertised IP-prefixes for an institution's (main-) AS. In any event,
no one would be forced to run this service if they have privacy concerns.


Re: [CODE4LIB] WebOPAC/III Z39.50 PHP Query/PHPYAZ

2012-05-10 Thread Godmar Back
Scraping III systems has got to be one of the most frequently repeated
tasks in the history of coding librarianship.

Majax2 ([1,2]) is one such service, though (as of right now) it doesn't
support search by Call Number.
Here's an example ISBN search:
http://libx.lib.vt.edu/services/majax2/isbn/0747591059?opacbase=http://catalog.library.miami.edu/search

Since you have Summon, you could use their API.  Example is here [3,4]

 - Godmar

[1] http://libx.lib.vt.edu/services/majax2/
[2] http://code.google.com/p/majax2/
[3] http://libx.lib.vt.edu/services/summon/test.php
[4] http://libx.lib.vt.edu/services/summon/

On Wed, May 9, 2012 at 11:27 AM, Madrigal, Juan A wrote:

> Hi,
>
> I'm looking for a way to send a Call Number to WebOPAC via a query so that
> I can return data (title, author, etc…) for a specific book in the catalog
> preferably in JSON or XML (I'll even take text at this point).
> I'm thinking that one way  to accomplish this is via Z39.50 and send a
> query to the backend that powers WebOPAC
>
> Has anyone done something similar to this?
>
> PHP YAZ (https://www.indexdata.com/phpyaz) looks promising, but I'd
> appreciate any guidance.
>
> Thanks,
>
> Juan Madrigal
>
> Web Developer
> Web and Emerging Technologies
> University of Miami
> Richter Library
>


Re: [CODE4LIB] Anyone using node.js?

2012-05-09 Thread Godmar Back
On Tue, May 8, 2012 at 11:26 PM, Ed Summers  wrote:

>
> For both these apps the socket.io library for NodeJS provided a really
> nice abstraction for streaming data from the server to the client
> using a variety of mechanisms: web sockets, flash socket, long
> polling, JSONP polling, etc. NodeJS' event driven programming model
> made it easy to listen to the Twitter stream, or the ~30 IRC channels,
> while simultaneously holding open socket connections to browsers to
> push updates to--all from within one process. Doing this sort of thing
> in a more typical web application stack like Apache or Tomcat can get
> very expensive where each client connection is a new thread or
> process--which can lead to lots of memory being used.
>
>
We've also been using socket.io for our cloudbrowser project, with great
success. The only drawback is that websockets don't (yet) support
compression, but that's not node.js fault. Another fault: you can't easily
migrate open socket.io connections across processes (yet). FWIW, since you
mention Rackspace - the lead student on the the cloudbrowser project has
now accepted a job at Rackspace (having turned down M$), in part because he
finds their technology/environment more exciting.

I need to dampen the enthusiasm about memory use a bit. It's true that
you're saving memory for additional threads etc., but - depending on your
application - you're also paying for that because V8 still lacks some
opportunities for sharing other environments have. For instance, if you run
25 Apache instances with say mod_whatever, they'll all share the code via
shared .so file. In Java/Tomcat, the JVM exploits, under the hood, similar
sharing opportunities.

V8/node.js, as of now, does not. This means if you need to load libraries
such as jQuery n times, you're paying a substantial price (we found on the
order of 1-2MB per instance), because V8 will not do any code sharing under
the hood.  That said, whether you need to load it multiple times depends on
your application - but that's another subtle and error prone issue.


> If you've done any JavaScript programming in the browser, it will seem
> familiar, because of the extensive use of callbacks. This can take
> some getting used to, but it can be a real win in some cases,
> especially in applications that are more I/O bound than CPU bound.
> Ryan Dahl (the creator of NodeJS) gave a presentation [4] to a PHP
> group last year which does a really nice job of describing how NodeJS
> is different, and why it might be useful for you. If you are new to
> event driven programming I wouldn't underestimate how much time you
> might spend feeling like you are turning our brain inside out.
>
>
The complications arising from event-based programming are an extensively
written-about topic of research; one available approach is the use of
compilers that provide a linear syntax for asynchronous calls. The TAME
system, which originally arose from research at MIT, is one such example.
Originally for C++, there's now a version for JavaScript available:
http://tamejs.org/  Though I haven't tried it myself, I'm eager to and
would also like to know if someone else has. The tamejs.org provides
excellent reading for why/how you'd want to do this.

 - Godmar


Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Godmar Back
On Tue, May 8, 2012 at 10:17 AM, Ethan Gruber  wrote:

> Thanks.  I have been working on a system that allows editing of RDF in web
> forms, creating linked data connections in the background, publishing to
> eXist and Solr for dissemination, and will eventually integrate operation
> with an RDF triplestore/SPARQL, all with Tomcat apps.  I'm not sure it is
> possible to create, manage, and deliver our content with node.js, but I was
> told by the project manager that Apache, Java, and Tomcat were "showing
> signs of age."  I'm not so sure about this considering the prevalence of
> Tomcat apps both in libraries and industry.  I happen to be very fond of
> Solr, and it seems very risky to start over in node.js, especially since I
> can't be certain the end product will succeed.  I prefer to err on the side
> of stability.
>
> If anyone has other thoughts about the future of Tomcat applications in the
> library, or more broadly cultural heritage informatics, feel free to jump
> in.  Our data is exclusively XML, so LAMP/Rails aren't really options.
>
>
We've used node.js (but not Express, their web app framework) to build our
own experimental AJAX framework (http://cloudbrowser.cs.vt.edu/ ). We also
have extensive experience with Tomcat-based systems.

Given that wide, and increasing use of node.js, I'm optimistic that it
should be stable and reliable enough for your needs; let me emphasize three
points you may want to consider.

a) You're programming in JavaScript/CoffeeScript, which is a higher-level
language than Java. My students are vastly more productive than in Java.
The use of CoffeeScript and require still allows for maintainable code.

b) node.js is a single-threaded environment. Reduced potential for some
race conditions, but requires an asynchronous programming style. If you've
done client-side AJAX, you'll find it familiar; otherwise, you need to
adapt. New potential for race conditions.

c) Scalability. Each node.js instance runs on a single core; modules exist
for clustering on a single machine. I don't know/don't believe session
state replication is as well supported as for Tomcat. On the other hand,
Tomcat can be a setup nightmare (in my experience).

d) Supporting libraries. We've found the surrounding infrastructure
excellent. A large community is developing for it http://search.npmjs.org/ .
The cool thing is that many client-side libraries work or are easily ported
(e.g. moment.js).

e) Doing XML in JavaScript. Though JavaScript as a language is intended to
be embedded in XML documents, processing XML in JavaScript can be almost as
awkward as in Java. JSON is clearly preferred and integrates very naturally.

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-12 Thread Godmar Back
On Mon, Mar 12, 2012 at 3:38 AM, Ed Summers  wrote:

> On Fri, Mar 9, 2012 at 12:12 PM, Godmar Back  wrote:
> > Here's my hand ||*(  [1].
>
> ||*)
>
> I'm sorry that I was so unhelpful w/ the "patches welcome" message on
> your docfix. You're right, it was antagonistic of me to suggest you
> send a patch for something so simple. Plus, it wasn't even accurate,
> because I actually wanted a pull request :-)
>
>
Here's a make-up pull request especially made for you :-)

https://github.com/edsu/pymarc/pull/25

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Godmar Back
On Fri, Mar 9, 2012 at 11:48 AM, Jon Gorman wrote:

>
> Can't we all just shake hands virtually or something?
>
>
Here's my hand ||*(  [1].

I overreacted, for which I'm sorry. (Also, I didn't see the entire github
conversation until I just now visited the website, the github email
notification seems selective and only sent me Ed's replies (?) in my
emailbox.)

 - Godmar

[1] http://www.kadifeli.com/fedon/smiley.htm


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Godmar Back
On Fri, Mar 9, 2012 at 10:37 AM, Michael B. Klein  wrote:

> The internal discussion then becomes, "I have a need, and I've written
> something that satisfies it. I think it could also be useful to others, but
> I'm not going to have time to make major changes or implement features
> others need. Should I open source this or keep it to myself? Does freeing
> my code come with an implicit requirement to maintain and support it?
> Should it?"
>
>
It used to be that way, at least it was this way when I grew up in open
source (in the 90s, before Eric Raymond invented the term). And it makes
sense, for successful projects that have at least a moderate number of
users.  Just dumping your code on github helps very few people.


> I'd vote open source just about every time. If someone sees the need and
> has the time to do a functional/requirements analysis and develop a "core
> team" around pymarc, more power to them. The code that's already there will
> give them a head start. Or they can start from scratch.
>
> Until then, it will remain a fork-patch-and-pull, community-supported
> project.
>

It's not just an agreement on design goals the core team must reach, it's
also the issue of maintaining a record (in email discussions/posts and in
the developer's minds) of what issues arose, what legacy decisions were
made, where backwards compatibility is required. That's something
maintainers do, it enables them to reason about future design
decisions. People who feel a sense of ownership and mental
investment. Sure, I could throw in a flag 'dont_utf8_encode' to make the
code work for my case. But it wouldn't improve the software.  (In pymarc's
case, I'd also recommend a discussion about data structures. For instance,
what should the type of the elements of the subfield array be that's passed
to a Field constructor? 8-bit string or unicode objects? The thread you
link to shows ambiguity here.)

Staying with fork-patch-and-pull may help individual people meet their
individual needs, but can prevent wide-spread adoption - and creates
frustration for users who may lack the expertise to track down encoding
errors or who are even unable to understand where the code they're using
lives on their machine. Once a piece of software has reached the stage
where it's distributed as a package (which pymarc, I believe, is), the
distributors have taken on a piece of responsibility. Related, being
unwilling to fix even documentation typos unless someone clones the
repository and delivers a pull request (on a silver platter?) seems unusual
to me, but - perhaps I'm just too old and culturally out of tune with
today's open source movement. (I'm not being ironic here, maybe there has
been a shift and I should just get with it.)

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-09 Thread Godmar Back
On Thu, Mar 8, 2012 at 3:53 PM, Mark A. Matienzo  wrote:

> On Thu, Mar 8, 2012 at 3:32 PM, Godmar Back  wrote:
>
> > One side comment here; while smart handling/automatic detection of
> > encodings would be a nice feature to have, it would help if pymarc could
> > operate in an 'agnostic', or 'raw' mode where it would simply preserve
> the
> > encoding that's there after a record has been read when writing the
> record.
> >
> > [ Right now, pymarc does not have such a mode - if leader[9] == 'a', the
> > data is unconditionally utf8 encoded on output as per mbklein's patch. ]
>
> Please feel free to write a patch and submit a pull request if you're
> able to contribute code to do this.
>
>
Mark, while I would be able to contribute code to pymarc, I probably won't
(unless my collaborators' needs in respect to pymarc become urgent.)

I've been contributing to open source for over 15 years, my first major
contribution having been the ext2fs filesystem code in the FreeBSD kernel (
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/filesystems-linux.html
)
and I'm a bit confused by how the spirit in the community has changed.  The
phrase "patches welcome" used to be reserved for when there was a feature
request somebody wanted, but you (the owner/maintainer of the software)
didn't have the time or considered the problem not important.

Back then, it used to be that all suggestions were welcome. For instance,
if a user pointed out a typo, you'd fix it. Similarly, if a user or fellow
developer pointed out a potential design flaw, you'd understand that you
don't ask for patches, but that you go back to the drawing board and think
about your software's design. In pymarc's case, what's needed is not more
code (it already has a moderately confusing set of almost a dozen switches
for reading/writing), but a requirement analysis where you think about use
cases you want to support. For instance, whether you want to support
reading/writing real world records in batches (without touching them) even
if they have flaws or not. And/Or whether you insist on interpreting a
record's data in terms of encoding, always. That's something occasional
contributors cannot do, it requires work by the core team, in discussion
with frequent users. (I would have liked to take this discussion to a
pymarc-users list, but didn't find any.)

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-08 Thread Godmar Back
On Thu, Mar 8, 2012 at 3:18 PM, Ed Summers  wrote:

> Hi Terry,
>
> On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
>  wrote:
> > This is one of the reasons you really can't trust the information found
> in position 9.  This is one of the reasons why when I wrote MarcEdit, I
> utilize a mixed process when working with data and determining characterset
> -- a process that reads this byte and takes the information under
> advisement, but in the end treats it more as a suggestion and one part of a
> larger heuristic analysis of the record data to determine whether the
> information is in UTF8 or not.  Fortunately, determining if a set of data
> is in UTF8 or something else, is a fairly easy process.  Determining the
> something else is much more difficult, but generally not necessary.
>
> Can you describe in a bit more detail how MARCEdit sniffs the record
> to determine the encoding? This has come up enough times w/ pymarc to
> make it worth implementing.
>
>
One side comment here; while smart handling/automatic detection of
encodings would be a nice feature to have, it would help if pymarc could
operate in an 'agnostic', or 'raw' mode where it would simply preserve the
encoding that's there after a record has been read when writing the record.

[ Right now, pymarc does not have such a mode - if leader[9] == 'a', the
data is unconditionally utf8 encoded on output as per mbklein's patch. ]

 - Godmar


Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-08 Thread Godmar Back
On Thu, Mar 8, 2012 at 1:46 PM, Terray, James  wrote:

> Hi Godmar,
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9:
> ordinal not in range(128)
>
> Having seen my fair share of these kinds of encoding errors in Python, I
> can speculate (without seeing the pymarc source code, so please don't hold
> me to this) that it's the Python code that's not set up to handle the UTF-8
> strings from your data source. In fact, the error indicates it's using the
> default 'ascii' codec rather than 'utf-8'. If it said "'utf-8' codec can't
> decode...", then I'd suspect a problem with the data.
>
> If you were to send the full traceback (all the gobbledy-gook that Python
> spews when it encounters an error) and the version of pymarc you're using
> to the program's author(s), they may be able to help you out further.
>
>
My question is less about the Python error, which I understand, than about
the MARC record causing the error and about how others deal with this issue
(if it's a common issue, which I do not know.)

But, here's the long story from pymarc's perspective.

The record has leader[9] == 'a', but really, truly contains ANSEL-encoded
data.  When reading the record with a MARCReader(to_unicode = False)
instance, the record reads ok since no decoding is attempted, but attempts
at writing the record fail with the above error since pymarc attempts to
utf8 encode the ANSEL-encoded string which contains non-ascii chars such as
0xe8 (the ANSEL Umlaut prefix). It does so because leader[9] == 'a' (see
[1]).

When reading the record with a MARCReader(to_unicode=True) instance, it'll
throw an exception during marc_decode when trying to utf8-decode the
ANSEL-encoded string. Rightly so.

I don't blame pymarc for this behavior; to me, the record looks wrong.

 - Godmar

(ps: that said, what pymarc does fails in different circumstances - from
what I can see, pymarc shouldn't assume that it's ok to utf8-encode the
field data if leader[9] is 'a'.  For instance, this would double-encode
correctly encoded Marc/Unicode records that were read with a
MARCReader(to_unicode=False) instance. But that's a separate issue that is
not my immediate concern. pymarc should probably remember if a record needs
or does not need encoding when writing it, rather than consulting the
leader[9] field.)


(*)
https://github.com/mbklein/pymarc/commit/ff312861096ecaa527d210836dbef904c24baee6


[CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-08 Thread Godmar Back
Hi,

a few days ago, I showed pymarc to a group of technical librarians to
demonstrate how easily certain tasks can be scripted/automated.

Unfortunately, it blew up at me when I tried to write a record:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9:
ordinal not in range(128)

Investigation revealed this culprit:

=LDR  00916nam a2200241I  4500
=001  ocm10685946
=005  19880203211447.0
=007  cr\bn||abp
=007  cr\bn||cda
=008  840503s1939gw00010\ger\d
=040  \\$aMBB$cMBB$dCRL
=049  \\$aCRLL
=100  10$aEsser, Hermann,$d1900-
=245  14$aDie judischer Weltpest ;$bjudendammerung auf dem
Erdball,$cvon Hermann Esser.
=260  0\$aMunchen,$bZentralverlag der N S D A P., F. Eher ahchf.,$c1939.
=300  \\$a243 [1] p.$c23 cm.
=533  \\$aAlso available as electronic reproduction.$bChicago :$cCenter for
Research Libraries,$d[2009]
=650  \0$aJewish question.
=700  12$aBierbrauer, Johann Jacob,$d1705-1760?
=710  2\$aCenter for Research Libraries (U.S.)
=856  41$uhttp://dds.crl.edu/CRLdelivery.asp?tid=10538$zOnline version
=907  \\$a.b28931622$b08-30-10$c08-30-10
=998  \\$awww$b08-30-10$cm$dz$e-$fger$ggw $h4$i0

The leader[9] field is set to 'a', so the record should contain
UTF8-encoded Unicode [1], but E8 75 in the 245$a appears to be ANSEL where
'E8' denotes the Umlaut preceding the lowercase 'u' (0x75). [2]

To me, this record looks misencoded... am I correct here? There are
thousands of such records in the data set I'm dealing with, which was
obtained using the 'Data Exchange' feature of III's Millennium system.

My question is how others, especially pymarc users dealing with III
records, deal with this issue or whatever other
experiences/hints/practices/kludges exist in this area.

Thanks.

 - Godmar

[1] http://www.loc.gov/marc/bibliographic/bdleader.html
[2] http://lcweb2.loc.gov/diglib/codetables/45.html


Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling

2012-02-27 Thread Godmar Back
On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann wrote:

> On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens  wrote:
>
> >
> > This issue is certainly not unique to VT - we've come across this as part
> > of our project. While the OAI-PMH record may point at the PDF, it can
> also
> > point to a intermediary page. This seems to be standard practice in some
> > instances - I think because there is a desire, or even requirement, that
> a
> > user should see the intermediary page (which may contain rights
> information
> > etc.) before viewing the full-text item. There may also be an issue where
> > multiple files exist for the same item - maybe several data files and a
> pdf
> > of the thesis attached to the same metadata record - as the metadata via
> > OAI-PMH may not describe each asset.
> >
> >
> This has been an issue since the early days of OAI-PMH, and many large
> providers provide such intermediate pages (arxiv.org, for instance). The
> other issue driving providers towards intermediate pages is that it allows
> them to continue to derive statistics from usage of their materials, which
> direct access URIs and multiple web caches don't.  For providers dependent
> on external funding, this is a biggie.
>
>
Why do you place direct access URI and multiple web caches into the same
category? I follow your argument re: usage statistics for web caches, but
as long as the item remains hosted in the repository direct access URIs
should still be counted (provided proper cache-control headers are sent.)
Perhaps it would require server-side statistics rather than client-based GA.

Also, it seems to me that except for Google full-text indexing engines
don't necessarily want to be come providers of cached copies (certainly the
discovery systems currently provided commercially don't AFAIK.)

 - Godmar


Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling

2012-02-27 Thread Godmar Back
On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens  wrote:

> On 26 Feb 2012, at 14:42, Godmar Back wrote:
>
> > May I ask a side question and make a side observation regarding the
> > harvesting of full text of the object to which a OAI-PMH record refers?
> >
> > In general, is the idea to use the /text() element, treat it
> as
> > a URL, and then expect to find the object there (provided that there was
> a
> > suitable  and  element)?
> >
> I think dc:identifier is usually used to provide a URL for the item being
> described. The examples at
> http://www.openarchives.org/OAI/openarchivesprotocol.html#dublincorefollow 
> this, and the UK E-Thesis schema (
> http://naca.central.cranfield.ac.uk/ethos-oai/2.0/oai-uketd.xml) does as
> well.
>
>
Thanks. FWIW, the  contains the same URL as the  field
in my example; but your interpretation of the  matches that
found in the OAI-PMH spec at
http://www.openarchives.org/OAI/openarchivesprotocol.html#UniqueIdentifier
where it also points out that it may not necessarily be a URL, could be any
URN or even a DOI as long as it relates the metadata to the underlying item.


> This issue is certainly not unique to VT - we've come across this as part
> of our project.


I note that this means that providing the service point URL for the ETD
OAI-PMH server is not sufficient to facilitate full-text
harvesting/indexing by a provider such as Summon. (And sure enough, they've
indexed only the metadata.) They would have to/will have to employ
additional effort.

Re: your points about the right to full-text index.

If indeed you're right that full-text indexing is a fair use (is it? Eric
Hellmann seems to indicate so:
http://go-to-hellman.blogspot.com/2010/02/copyright-safe-full-text-indexing-of.html
as
long as the technical definition of making a copy is met.) - if that's
indeed so, then of course the intentions of the author don't matter, at
least in the US legal system.  Otherwise, my point would have been that I'd
like to see the signed ETD agreement forms extended to explicitly include
the author's permission for full-text indexing.

 - Godmar


Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling

2012-02-26 Thread Godmar Back
May I ask a side question and make a side observation regarding the
harvesting of full text of the object to which a OAI-PMH record refers?

In general, is the idea to use the /text() element, treat it as
a URL, and then expect to find the object there (provided that there was a
suitable  and  element)?

Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the
harvesting of ETD metadata.  Yet, its metadata reads:


   
   
 
text
application/pdf

http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/



When one visits
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however
there is no 'text' document of type 'application/pdf' - rather, it's an
HTML title page that embeds links to one or more PDF documents, such as
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto
Walker_5.pdf.

Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up
to allow the harvesting of full-text without what would basically amount to
crawling the ETD title page, or other repository-specific mechanisms?

On a related note, regarding rights. As a faculty member, I regularly sign
ETD approval forms.  At Tech, students have three options to choose from:
(a) open and immediate access, (b) restricted to VT for 1 year, (c)
withhold access completely for 1 year for patent/security purposes.  The
current form does not allow student authors to address whether the
full-text of their dissertation may be harvested for the purposes of
full-text indexing in such indexes as Google or Summon, not does it allow
them to restrict where copies are served from.  Similarly, the dc:rights
section in the OAI-PMH records address copyright only.  In practice, Google
crawls, indexes, and serves full-text copies of our dissertations.

 - Godmar


Re: [CODE4LIB] Voting for c4l 2012 talks ends today

2011-12-09 Thread Godmar Back
This site shows:

Ruby (Rack) application could not be started
On Fri, Dec 9, 2011 at 11:50 AM, Anjanette Young
wrote:

> Get your votes in before 5pm (PST)
>
> http://vote.code4lib.org/election/21  -- You will need your
> code4lib.orglogin in order to vote. If you do not have one you can
> create one at
> http://code4lib.org/
>


Re: [CODE4LIB] Sending html via ajax -vs- building html in js (was: jQuery Ajax request to update a PHP variable)

2011-12-08 Thread Godmar Back
On Thu, Dec 8, 2011 at 12:24 PM, Brian Tingle
 wrote:
> most sense for what you are trying to do.  And for things that work that
> way now, I don't see a need to rush and change it all to JSONP callbacks
> because of some vague security concern.

My comment wasn't security-related. Also, I wasn't talking about
cross-domain JSONP. Obviously, you need to trust the producer there.

That said, I do buy the security argument that HTML is much harder to
verify for absence of, for instance, XSS vulnerabilities. At least
that's what can be inferred from the high frequency with which they're
occurring. Reducing the number of times (specifically places in the
code) where one generates and transmits it could certainly help here.

 - Godmar


Re: [CODE4LIB] Sending html via ajax -vs- building html in js (was: jQuery Ajax request to update a PHP variable)

2011-12-08 Thread Godmar Back
On Thu, Dec 8, 2011 at 11:14 AM, BRIAN TINGLE
 wrote:
> returning JSONP is the the cool hipster way to go (well, not hipster cool 
> anymore, but the hipsters were doing it before it went mainstream), but I'm 
> not convinced it is inherently a problem to return HTML for use in "AJAX" 
> type development in a non--ironic-retro way.

Let me give you an example for why returning HTML is a difficult
approach, to say the least, when it comes to rich AJAX applications. I
had in my argument referred to a trend, connected to increasing
richness and interactivity in AJAX applications being developed today.

Say you have an  text component in your app. When its value
changes, a number of other components need to be updated. For
instance, a label (which is a ), and a button. In addition, it
may be the case that new elements appear on the page when that's the
case. For example, in our edition builder app, a user may change the
name of a proxy (as in EZProxy.) We need to update the title, we need
to update the label on the button where it can be removed, and we need
to update the status message when the configuration was last changed.
The response looks somewhat like this:

{"rs":[
  ["setAttr",["dP1Qvi","value","Off-Campus"]]
  ["setAttr",["dP1Q9m0","label","Delete Proxy Off-Campus"]],
  ["setAttr",["dP1Qy7","content","12:01:03 PM Dec 8, 2011 edition 1C7B69E1
rev 1<\/a> successfully saved. Click here<\/a>
to try out this configuration."]],
],"rid":11}

(simplified here.)

Coding this as a single HTML response is impossible. First of, it
would mean finding a common anchestor element that could be replaced,
which would result in a huge transfer. Obviously, you don't want
multiple AJAX requests. So you'd come up with your own encoding of all
updates to be done on the client. You could do that using multiple
HTML fragments, and defining your own response format around them.
Older frameworks have done that, but have since realized that it's
better to simply return JSON-encoded instructions to the client what
to update.

If we tell newbies (no offense meant by that term) that AJAX means
"send a request and then insert a chunk of HTML in your DOM," we're
short-changing their view of the type of Rich Internet Application
(RIA) AJAX today is equated with.

 - Godmar


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-07 Thread Godmar Back
On Tue, Dec 6, 2011 at 3:40 PM, Doran, Michael D  wrote:

>
> > Current trends certainly go in the opposite direction, look at jQuery
> > Mobile.
>
> I agree that jQuery Mobile is very popular now.  However, that in no way
> negates the caution.  One could consider it as a "tragedy of the commons"
> in which a user's iPhone battery is the shared resource.  Why should I as a
> developer (rationally consulting my own self-interest) conserve battery
> power that doesn't belong to me, just so some other developer's app can use
> that resource?  I'm just playing the devil's advocate here. ;-)
>
>
You're taking it as given that the use of JavaScript on a mobile device is
significantly less energy-efficient than an approach that would exercise
only the HTML parsing path. Be careful here, intuition can be misleading.
Devices cannot send HTML to their displays. It takes energy to parse it,
and energy to render it. Time is roughly proportional to energy. Where do
you think most time/energy is spent in? (page-provided) JavaScript
execution, HTML parsing, or page layout/rendering?

Based on the information I have available to me (I'd appreciate pointers to
other studies), JS execution does not dominate - it ranks last behind Page
layout and rendering [1], even for sites that are JS heavy, such as webmail
sites. Interestingly, a large part of that is evaluating CSS selectors.

On a related note, let me point out that there are many ways to change the
DOM on the client. Client-side templating frameworks such as knockout.js or
jQuery tmpl produce HTML (which then must be parsed), but modern AJAX
frameworks such as ZK don't produce any HTML at all, skipping parsing
altogether.

I meant to add another reason why at this point teaching newbies an AJAX
style that relies on HTML-returning entry points is a really bad idea, and
that is the move from read-only applications (like Nate's) to applications
that actually update state on the server. In this case, multiple parts of
the client page (perhaps a label here, a link there) need to be updated.
Expressing this in HTML is cumbersome, to say the least. (As an aside, I
note that AJAX frameworks such as ZK, which pursued the HTML approach in
their first iterations, have moved away from it. Compare the client/server
traffic on a ZK 3.x application to the one in a ZK 5. app to see this.)

For those interested in how to use one of possible client-side approaches
I'm suggesting, I prototyped Nate's application using only client-side
templating: http://libx.lib.vt.edu/services/popsubjects/cs/ It uses
knockout.js's data binding facilities as well as (due to qTip 1.0's design)
the jQuery tmpl engine. Read the (small, self-contained) source to learn
about the server-side entry points. (I should point out that in this case,
the need for the book cover ISBNs to be retrieved remotely is somewhat
contrived; they should probably be sent along with the page in the first
place.) A side effect of this JSON-oriented design is that it results in 2
nice JSON-P web services that can be embedded/used in other
pages/applications.

 - Godmar

[1]
http://www.eecs.berkeley.edu/~lmeyerov/projects/pbrowser/pubfiles/login.pdf


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-06 Thread Godmar Back
On Tue, Dec 6, 2011 at 1:57 PM, Jonathan Rochkind  wrote:

> On 12/6/2011 1:42 PM, Godmar Back wrote:
>
>> Current trends certainly go in the opposite direction, look at jQuery
>> Mobile.
>>
>
> Hmm, JQuery mobile still operates on valid and functional HTML delivered
> by the server. In fact, one of the designs of JQuery mobile is indeed to
> degrade to a non-JS version in "feature phones" (you know, eg, flip phones
> with a web browser but probably no javascript).  The non-JS version it
> degrades to is the same HTML that was delivered to the browser in either
> way, just not enhanced by JQuery Mobile.
>

My argument was that current platforms, such as jQuery mobile, heavily rely
on JavaScript on the very platforms on which Crockford statement points out
it would be wise to save energy. Look at the jQuery Mobile documentation,
A-grade platforms:
http://jquerymobile.com/demos/1.0/docs/about/platforms.html



If I were writing AJAX requests for an application targetted mainly at
> JQuery Mobile... I'd be likely to still have the server delivery HTML to
> the AJAX request, then have js insert it into the page and trigger JQuery
> Mobile enhancements on it.
>
>
I wouldn't. Return JSON and interpret or template the result.

 - Godmar


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-06 Thread Godmar Back
On Tue, Dec 6, 2011 at 11:22 AM, Doran, Michael D  wrote:

> > You had earlier asked the question whether to do things client or server
> > side - well in this example, the correct answer is to do it client-side.
> > (Yours is a read-only application, where none of the advantages of
> > server-side processing applies.)
>
> One thing to take into consideration when weighing the advantages of
> server-side vs. client-side processing, is whether the web app is likely to
> be used on mobile devices.  Douglas Crockford, speaking about the fact that
> JavaScript has become the de fact universal runtime, cautions: "Which I
> think puts even more pressure on getting JavaScript to go fast.
> Particularly as we're now going into mobile. Moore's Law doesn't apply to
> batteries. So how much time we're wasting interpreting stuff really matters
> there. The cycles count."[1]  Personally, I don't know enough to know how
> significant the impact would be.  However, I understand Douglas Crockford
> knows a little something about JavaScript and JSON.
>
>
It's certainly true that limited energy motivates the need to minimize
client processing, but the conclusion that this then means server
generation of static HTML is not clear.

Current trends certainly go in the opposite direction, look at jQuery
Mobile.

 - Godmar


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-06 Thread Godmar Back
On Tue, Dec 6, 2011 at 11:18 AM, Nate Hill  wrote:

> I attached the app as it stands now.  There's something wrong w/ the regex
> matching in catscrape.php so only some of the images are coming through.
>

No, it's not the regexp. You're simply scraping syndetics links, without
checking if syndetics has or does not have an image for those ISBNs. Those
searches where the first four hits have jackets display, the others don't.


> Also: should I be sweating the fact that basically every time someone
> mouses over one of these boxes they are hitting our library catalog with a
> query?  It struck me that this might be unwise.  But I don't know either
> way.
>
>
Yes, it's unwise, especially since the results won't change (much).

 - Godmar


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-06 Thread Godmar Back
On Tue, Dec 6, 2011 at 8:38 AM, Erik Hatcher  wrote:

> I'm with jrock on this one.   But maybe I'm a luddite that didn't get the
> memo either (but I am credited for being one of the instrumental folks in
> the Ajax world, heh - in one or more of the Ajax books out there, us old
> timers called it "remote scripting").
>
>
On the in-jest rhetorical front, I'm wondering if referring to oneself as
oldtimer helps in defending against insinuations that opposing
technological change makes one a defender of the old ;-)

But:


> What I hate hate hate about seeing JSON being returned from a server for
> the browser to generate the view is stuff like:
>
>   string = "" + some_data_from_JSON + "";
>
> That embodies everything that is wrong about Ajax + JSON.
>
>
That's exactly why you use new libraries such as knockout.js, to avoid just
that. Client-side template engines with automatic data-bindings.

Alternatively, AJAX frameworks use JSON and then interpret the returned
objects as code. Take a look at the client/server traffic produced by ZK,
for instance.


> As Jonathan said, the server is already generating dynamic HTML... why
> have it return


It isn't. There is no already generating anything server, it's a new app
Nate is writing. (Unless you count his work of the past two days). The
dynamic HTML he's generating is heavily tailored to his JS. There's
extremely tight coupling, which now exists across multiple files written in
multiple languages. Simply avoidable bad software engineering. That's not
even making the computational cost argument that avoiding template
processing on the server is cheaper. And with respect to Jonathan's
argument of degradation, a degraded version of his app (presumably) would
use  - or something like that, it'd look nothing like what's he
showed us yesterday.

Heh - the proof of the pudding is in the eating. Why don't we create 2
versions of Nate's app, one with mixed server/client - like the one he's
completing now, and I create the client-side based one, and then we compare
side by side?  I'll work with Nate on that.

  - Godmar

[ I hope it's ok to snip off the rest of the email trail in my reply. ]


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-05 Thread Godmar Back
On Mon, Dec 5, 2011 at 6:45 PM, Jonathan Rochkind  wrote:

> I still like sending HTML back from my server. I guess I never got the
> message that that was out of style, heh.
>
>
I suppose there are always some stalwart defenders of the status quo ;-)

More seriously, I think I'd like to defend my statement.

The purpose of graceful degradation is well-acknowledged - I don't think
no-JS browsers are much of a concern, but web spiders are and so are
probably ADA accessibility requirements, as well as low-bandwidth
environments.

I do not believe, however, that such situation warrant any sharing of HTML
templates. If they do, it means your app is, well, perhaps outdated in that
it doesn't make full use of today's JS features. Certainly Gmail's "basic
html version for low bandwidth environments" shares no HTML templates with
the JS main app. In Nate's case, which is a heavily JS-dependent app (he
uses various jQuery plug-ins to drive his layout, as well as qtip for
tooltips), I find it difficult to see how any degraded environment would
share any HTML with his app.

That said, I'm genuinely interested in what others are thinking/have
experienced.

Also, for expository purposes, I'd love to prototype the client-side for
Nate's app. Then we could compare the mixed PhP server/client-side AJAX
version with the pure JS app I'm suggesting.

 - Godmar


On Mon, Dec 5, 2011 at 6:45 PM, Jonathan Rochkind  wrote:

> I still like sending HTML back from my server. I guess I never got the
> message that that was out of style, heh.
>
> My server application already has logic for creating HTML from templates,
> and quite possibly already creates this exact same piece of HTML in some
> other place, possibly for use with non-AJAX fallbacks, or some other
> context where that snippet of HTML needs to be rendered. I prefer to re-use
> this logic that's already on the server, rather than have a duplicate HTML
> generating/templating system in the javascript too.  It's working fine for
> me, in my use patterns.
>
> Now, certainly, if you could eliminate any PHP generation of HTML at all,
> as I think Godmar is suggesting, and basically have a pure Javascript app
> -- that would be another approach that avoids duplication of HTML
> generating logic in both JS and PHP. That sounds fine too. But I'm still
> writing apps that degrade if you have no JS (including for web spiders that
> have no JS, for instance), and have nice REST-ish URLs, etc.   If that's
> not a requirement and you can go all JS, then sure.  But I wouldn't say
> that making apps that use progressive enhancement with regard to JS and
> degrade fine if you don't have is "out of style", or if it is, it ought not
> to be!
>
> Jonathan
>
>
>


Re: [CODE4LIB] jQuery Ajax request to update a PHP variable

2011-12-05 Thread Godmar Back
FWIW, I would not send HTML back to the client in an AJAX request - that
style of AJAX fell out of favor years ago.

Send back JSON instead and keep the view logic client-side. Consider using
a library such as knockout.js. Instead of your current (difficult to
maintain) mix of PhP and client-side JavaScript, you'll end up with a
static HTML page, a couple of clean JSON services (for checked-out per
subject, and one for the syndetics ids of the first 4 covers), and clean
HTML templates.

You had earlier asked the question whether to do things client or server
side - well in this example, the correct answer is to do it client-side.
(Yours is a read-only application, where none of the advantages of
server-side processing applies.)

 - Godmar

On Mon, Dec 5, 2011 at 6:18 PM, Nate Hill  wrote:

> Something quite like that, my friend!
> Cheers
> N
>
> On Mon, Dec 5, 2011 at 3:10 PM, Walker, David 
> wrote:
>
> > I gotcha.  More information is, indeed, better. ;-)
> >
> > So, on the PHP side, you just need to grab the term from the  query
> > string, like this:
> >
> >  $searchterm = $_GET['query'];
> >
> > And then in your JavaScript code, you'll send an AJAX request, like:
> >
> >  http://www.natehill.net/vizstuff/catscrape.php?query=Cooking
> >
> > Is that what you're looking for?
> >
> > --Dave
> >
> > -
> > David Walker
> > Library Web Services Manager
> > California State University
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> > Nate Hill
> > Sent: Monday, December 05, 2011 3:00 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] jQuery Ajax request to update a PHP variable
> >
> > As always, I provided too little information.  Dave, it's much more
> > involved than that
> >
> > I'm trying to make a kind of visual browser of popular materials from one
> > of our branches from a .csv file.
> >
> > In order to display book covers for a series of searches by keyword, I
> > query the catalog, scrape out only the syndetics images, and then
> display 4
> > of them.  The problem is that I've hardcoded in a search for 'Drawing',
> > rather than dynamically pulling the correct term and putting it into the
> > catalog query.
> >
> > Here's the work in process, and I believe it will only work in Chrome
> > right now.
> > http://www.natehill.net/vizstuff/donerightclasses.php
> >
> > I may have a solution, Jason's idea got me part way there.  I looked all
> > over the place for that little snippet he sent over!
> >
> > Thanks!
> >
> >
> >
> > On Mon, Dec 5, 2011 at 2:44 PM, Walker, David 
> > wrote:
> >
> > > > And I want to update 'Drawing' to be 'Cooking'  w/ a jQuery hover
> > > > effect on the client side then I need to make an Ajax request,
> correct?
> > >
> > > What you probably want to do here, Nate, is simply output the PHP
> > > variable in your HTML response, like this:
> > >
> > >  
> > >
> > > And then in your JavaScript code, you can manipulate the text through
> > > the DOM like this:
> > >
> > >  $('#foo').html('Cooking');
> > >
> > > --Dave
> > >
> > > -
> > > David Walker
> > > Library Web Services Manager
> > > California State University
> > >
> > >
> > > -Original Message-
> > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > > Of Nate Hill
> > > Sent: Monday, December 05, 2011 2:09 PM
> > > To: CODE4LIB@LISTSERV.ND.EDU
> > > Subject: [CODE4LIB] jQuery Ajax request to update a PHP variable
> > >
> > > If I have in my PHP script a variable...
> > >
> > > $searchterm = 'Drawing';
> > >
> > > And I want to update 'Drawing' to be 'Cooking'  w/ a jQuery hover
> > > effect on the client side then I need to make an Ajax request, correct?
> > > What I can't figure out is what that is supposed to look like...
> > > something like...
> > >
> > > $.ajax({
> > >  type: "POST",
> > >  url: "myfile.php",
> > >  data: "...not sure how to write what goes here to make it
> 'Cooking'..."
> > > });
> > >
> > > Any ideas?
> > >
> > >
> > > --
> > > Nate Hill
> > > nathanielh...@gmail.com
> > > http://www.natehill.net
> > >
> >
> >
> >
> > --
> > Nate Hill
> > nathanielh...@gmail.com
> > http://www.natehill.net
> >
>
>
>
> --
> Nate Hill
> nathanielh...@gmail.com
> http://www.natehill.net
>


Re: [CODE4LIB] Examples of Web Service APIs in Academic & Public Libraries

2011-10-08 Thread Godmar Back
On Sat, Oct 8, 2011 at 1:40 PM, Patrick Berry  wrote:

> We're (CSU, Chico) using http://code.google.com/p/googlebooks/ to provide
> easy access to partial and full text books.
>
>
Good to hear.

As an aside, we wrote up some background on how to use widgets and
webservices in a 2010 article published in LITA's ITAL magazine:

http://www.lita.org/ala/mgrps/divs/lita/publications/ital/29/2/back.pdf

 - Godmar



> On Sat, Oct 8, 2011 at 10:33 AM, Michel, Jason Paul  >wrote:
>
> > Hello all,
> >
> > I'm a lurker on this listserv and am interested in gaining some insight
> > into your experiences of utilizing web service APIs in either an academic
> > library or public library setting.
> >
> > I'm writing a book for ALA Editions on the use of Web Service APIs in
> > libraries.  Each chapter covers a specific API by delineating the
> > technicalities of the API, discussing potential uses of the API in
> library
> > settings, and step-by-step tutorials.
> >
> > I'm already including examples of how my library (Miami University in
> > Oxford, Ohio) are utilizing these APIs but would like to give the reader
> > more examples from a variety of settings.
> >
> > APIs covered in the book: Flickr, Vimeo, Google Charts, Twitter, Open
> > Library, LibraryThing, Goodreads, OCLC.
> >
> > So, what are you folks doing with APIs?
> >
> > Thanks for any insight!
> >
> > Kind regards,
> >
> > Jason
> >
> > --
> > Jason Paul Michel
> > User Experience Librarian
> > Miami University Libraries
> > Oxford, Ohio 45044
> > twitter:jpmichel
> >
>


Re: [CODE4LIB] ny times best seller api

2011-09-29 Thread Godmar Back
On Wed, Sep 28, 2011 at 5:02 PM, Michael B. Klein  wrote:

>
> It's not NYTimes.com's fault; it's the cross-site scripting jerks who made
> the security necessary in the first place.
>
>
NYTimes could allow JSONP, but then developers would need to embed their API
key in their web pages, which means the API key would simply be a token used
for statistics, rather than for authentication. It's their choice that they
don't allow that.

Closer to the code4lib community: OCLC and Serials Solutions don't support
JSONP in their webservices, either, even though doing so would allow cool
services and would likely not affect their business models adversely in a
significant way, IMO. We should keep lobbying them to remove these
restrictions, as I've been doing for a while.

 - Godmar


Re: [CODE4LIB] ny times best seller api

2011-09-28 Thread Godmar Back
Are you trying to run this inside a webpage served from a domain other than
nytimes.com?
If so, you'd need to use JSONP, which a cursory examination of their API
documentation reveals they do not support. So, you need to use a proxy.

Here's one:
$ cat hardcover.php
http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?&api-key='
);
header("Content-Type: text/javascript");
echo $cb . '(' . $json . ')';

?>

Install it on your webserver, then change your JavaScript code to refer to
it using callback=?.

For instance, if you installed it on
http://libx.lib.vt.edu/services/nytimes/hardcover.php
then you would be using the URL
http://libx.lib.vt.edu/services/nytimes/hardcover.php?callback=?
(.getJSON will replace the ? with a suitably generated function name).

 - Godmar

On Wed, Sep 28, 2011 at 3:28 PM, Nate Hill  wrote:

> Anybody out there using the NY times best seller API to do stuff on their
> library websites?
> I can't figure out what's wrong with my code here.
> Data is returned as "null"; I can't seem to parse the response with jQuery.
> Any help would be supercool.
> I removed the API key - my code doesn't actually contain ''.
> Here's the jQuery:
>
> jQuery(document).ready(function(){
>$(function(){
>//json request to new york times
>$.getJSON('
>
> http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?&api-key=
> ',
>
>function(data) {
>//loop through the results with the following
> function
>$.each(data.results.book_details, function(i,item){
>//turn the title into a variable
>var bookTitle = item.title;
>$('#container').append(''+bookTitle+'');
>
>});
>});
>});
> });
>
>
> Here's a snippet of the JSON response:
>
> {
>"status": "OK",
>"copyright": "Copyright (c) 2011 The New York Times Company.  All Rights
> Reserved.",
>"num_results": 35,
>"last_modified": "2011-09-23T12:00:29-04:00",
>"results": [{
>"list_name": "Hardcover Fiction",
>"display_name": "Hardcover Fiction",
>"updated": "WEEKLY",
>"bestsellers_date": "2011-09-17",
>"published_date": "2011-10-02",
>"rank": 1,
>"rank_last_week": 0,
>"weeks_on_list": 1,
>"asterisk": 0,
>"dagger": 0,
>"isbns": [{
>"isbn10": "0399157786",
>"isbn13": "9780399157783"
>}],
>"book_details": [{
>"title": "NEW YORK TO DALLAS",
>"description": "An escaped child molester pursues Lt. Eve
> Dallas; by Nora Roberts, writing pseudonymously.",
>"contributor": "by J. D. Robb",
>"author": "J D Robb",
>"contributor_note": "",
>"price": 27.95,
>"age_group": "",
>"publisher": "Putnam",
>"primary_isbn13": "9780399157783",
>"primary_isbn10": "0399157786"
>}],
>"reviews": [{
>"book_review_link": "",
>"first_chapter_link": "",
>"sunday_review_link": "",
>"article_chapter_link": ""
>}]
>
>
> --
> Nate Hill
> nathanielh...@gmail.com
> http://www.natehill.net
>


Re: [CODE4LIB] internet explorer and pdf files

2011-08-31 Thread Godmar Back
On Wed, Aug 31, 2011 at 8:42 AM, Eric Lease Morgan  wrote:

> Eric wrote:
>
> > Unfortunately IE's behavior is weird. The first time someone tries to
> load
> > one of these URL nothing happens. When someone tries to load another one,
> it
> > loads just fine. When they re-try the first one, it loads. We are banging
> > our heads against the wall here at Catholic Pamphlet Central. Networking
> > issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus
> > off-campus issue?
>
> Thank you for all the replies.
>
> We'er not one hundred percent positive, but we think the issue with IE has
> something to do with headers. As alluded to previously, IE needs/desires
> file name extensions in order to know what to do with incoming files. We are
> serving these PDF documents from Fedora which is sending out a stream, not
> necessarily a file. Apparently this confuses IE. Since Fedora is not really
> designed to be a "file server", we will write a piece of intermediary
> software to act as a go between. This isn't really a big deal since all of
> our other implementations of Fedora are expected to work in the same way.
> Wish us luck.
>
>
FWIW, this is true for any and all HTTP servers.  Only the client's request
specifies a name (as the path component of the request, e.g.,
/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1

The server's reply does not contain a name at all. It simply specifies what
type and, typically, the length of the returned content is. The returned
content itself is just a blob of bytes. Your server says "this blob  of
bytes is a PDF object (application/pdf)", but it doesn't specify the length.
 Not specifying the length makes the job of the client slightly more
difficult, which is why the HTTP/1.1 specification discourages it; it now
has to read the stream until the server closes the connection. It is
certainly possible that IE's PDF plug-in is not prepared to deal with this
situation; and I would certainly fix this first.

 - Godmar


Re: [CODE4LIB] internet explorer and pdf files

2011-08-29 Thread Godmar Back
Earlier versions of IE were known to sometimes disregard the Content-Type
(which you set correctly to application/pdf) and look at the suffix of the
URL instead. For instance, they would render HTML if you served a .html as
text/plain, etc.

You may try creating URLs that end with .pdf

Separately, you're not sending a Content-Length header:

HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: Apache-Coyote/1.1
  Pragma: No-cache
  Cache-Control: no-cache
  Expires: Wed, 31 Dec 1969 19:00:00 EST
  Content-Type: application/pdf
  Date: Mon, 29 Aug 2011 19:47:27 GMT
  Connection: close
Length: unspecified [application/pdf]

which disregards RFC 2616,
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

On Mon, Aug 29, 2011 at 3:30 PM, Eric Lease Morgan  wrote:

> I need some technical support when it comes to Internet Explorer (IE) and
> PDF files.
>
> Here at Notre Dame we have deposited a number of PDF files in a Fedora
> repository. Some of these PDF files are available at the following URLs:
>
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1
>  *
> http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1
>
> Retrieving the URLs with any browser other than IE works just fine.
>
> Unfortunately IE's behavior is weird. The first time someone tries to load
> one of these URL nothing happens. When someone tries to load another one, it
> loads just fine. When they re-try the first one, it loads. We are banging
> our heads against the wall here at Catholic Pamphlet Central. Networking
> issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus
> off-campus issue?
>
> Could some of y'all try to load some of the URLs with IE and tell me your
> experience? Other suggestions would be greatly appreciated as well.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
> (574) 631-8604
>


Re: [CODE4LIB] dealing with Summon

2011-03-02 Thread Godmar Back
On Wed, Mar 2, 2011 at 11:54 AM, Demian Katz  wrote:
>> These are the questions I'm seeking answers to; I know that those of
>> you who have coded their own Summon front-ends must have faced the
>> same questions when implementing their record displays.
>
> Feel free to refer to VuFind's Summon template for reference if that is 
> helpful:
>
> https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/interface/themes/default/Summon/record.tpl
>
> Andrew wrote this originally, and I've tweaked it in a few places to address 
> problems as they arose.  I don't claim that this offers the definitive answer 
> to your questions...  but it's working reasonably well for us so far.
>

Ah, thanks.  As they say, a piece of code speaks a thousand words!

So, to solve the conundrum: only PublicationDate_xml and
PublicationDate are of interest. If the former is given, use it and
print (if available) its .month, .day, and .year fields. Else, if the
latter is given, just print it.
Ignore all other date-related fields. Ignore PublicationDate_xml.text.
 Ignore if there's more than one date field - use the first one.

This knowledge will also help me avoid sending unnecessary data to the
LibX client. As you know, Summon requires a proxy that talks to the
actual service, and cutting out redundant and derived fields at the
proxy could save a fair amount of bandwidth (though I'll have to check
if it also shaves off latency.) A typical search response (raw JSON,
with 20 hits) is > 500KB long, so investing computing time at the
proxy in cutting this down may be promising.

 - Godmar


Re: [CODE4LIB] dealing with Summon

2011-03-02 Thread Godmar Back
On Wed, Mar 2, 2011 at 11:36 AM, Walker, David  wrote:
> Just out of curiosity, is there a Summon (API) developer listserv?  Should 
> there be?

Yes, there is - I'm waiting for my subscription there to be approved.

Like I said at the beginning of this thread, this is only tangentially
a Code4Lib issue, and certainly the details aren't.  But perhaps the
general problem is (?)

 - Godmar


Re: [CODE4LIB] dealing with Summon

2011-03-02 Thread Godmar Back
On Wed, Mar 2, 2011 at 11:12 AM, Roy Tennant  wrote:
> Godmar,
> I'm surprised you're asking this. Most of the questions you want
> answered could be answered by a basic programming construct: an
> if-then-else statement and a simple decision about what you want to
> use in your specific application (for example, do you prefer "text"
> with the period, or not?). About the only question that such a
> solution wouldn't deal with is "which fields are derived from which
> others", which strikes me as superfluous to your application if you
> know a hierarchy of preference. But perhaps I'm missing something
> here.

I'm not asking how to code it, I'm asking for the algorithm I should
use, given the fact that I'm not familiar with the provenance and
status of the data Summon returns (which, I understand, is a mixture
of original, harvested data, and "cleaned-up", processed data.)

Can you suggest such an algorithm, given the fact that each of the 8
elements I showed in the example (PublicationDateYear,
PublicationDateDecade, PublicationDate, PublicationDateCentury,
PublicationDate_xml.text, PublicationDate_xml.day,
PublicationDate_xml.month, PublicationDate_xml.year is optional?  But
wait  I think I've also seen records where there is a
PublicationDateMonth, and records where some values have arrays of
length > 1.

Can you suggest, or at least outline, such an algorithm?

It would be helpful to know, for instance, if the presence of a
PublicationDate_xml field supplants any other PublicationDate* fields
(does it?)  If a PublicationDate_xml field is absent, which field
would I want to look at next?  Is PublicationDate more reliable than a
combination of PublicationDateYear and PublicationDateMonth (and
perhaps PublicationDateDay if it exists?)?

If the PublicationDate_xml is present, then: should I prefer the .text
option?  What's the significance of that dot? Is it spurious, like the
identifier you mentioned you find in raw MARC records?  If not, what,
if anything, is known about the presence of the other fields?  What if
multiple fields are given in an array?  Is the ordering significant
(e.g., the first one is more trustworthy?) Or should I sort them based
on a heuristics?  (e.g., if "20100523" and "201005" is given, prefer
the former?)  What if the data is contradictory?

These are the questions I'm seeking answers to; I know that those of
you who have coded their own Summon front-ends must have faced the
same questions when implementing their record displays.

 - Godmar


Re: [CODE4LIB] dealing with Summon

2011-03-02 Thread Godmar Back
On Tue, Mar 1, 2011 at 11:14 PM, Roy Tennant  wrote:
>> On Tue, Mar 1, 2011 at 2:14 PM, Godmar Back  wrote:
>>
>>Similarly, the date associated with a record can come in a variety of
>>formats. Some are single-field (20080901), some are abbreviated
>>(200811), some are separated into year, month, date, etc.  Some
>>records have a mixture of those.
>
> In this world of MARC (s/MARC/hurt) I call that an embarrassment of
> riches. I've spent some bit of time parsing MARC, especially lately,
> and just the fact that Summon provides a normalized date element is
> HUGE.

That's great to hear - but how do I know which elements to use?

For instance, look at the JSON excerpt at
http://api.summon.serialssolutions.com/help/api/search/response/documents

 "PublicationDateCentury":[
  "1900"
],
"PublicationDateDecade":[
  "1970"
],
"PublicationDateYear":[
  "1979"
],
"PublicationDate":[
  "1979."
],
"PublicationDate_xml":[
  {
"day":"01",
"month":"01",
"text":"1979.",
"year":"1979"
  }
],

Which one is the cleaned up date, and in which order shall I be
looking for the date field in the record when some or all of this
information is missing in a particular record?

Andrew responded to that if given, PublicationDate_xml is the
preferred one - but this raises the question which field in
PublicationDate_xml to use: .text, .day, or .year?  What if some are
missing?
What if PublicationDate_xml is missing, then I use or look for
PublicationDate?  Or is PublicationDateYear/Month/Decade preferred to
PublicationDate?  Which fields are derived from which others?

These are the types of questions I'm looking to answer.

 - Godmar


Re: [CODE4LIB] detecting user copying URL?

2010-12-01 Thread Godmar Back
On Thu, Dec 2, 2010 at 12:25 AM, Susan Kane wrote:

> Absolutely this should be solved by the vendors / content providers but --
> just for the sake of argument -- it is a possible extension for LibX?
>
> You can't send a standard message everytime a user copies a URL from their
> address bar -- they would kill you.
>
> Is there a way for a browser plugin to "know" that the user is on a
> specific
> website and to warn them for such actions while there?
>
> Or would that level of coordination between the website and the address bar
> be (a) impossible or (b) not really not worth the effort or (c) a serious
> privacy concern?
>
>
Extensions such as LibX can certainly interpose when users bookmark items,
at least in Firefox (and possibly Chrome). The question is how to determine
if a URL is bookmarkable or not. This could be done either by consulting a
database - online or built-in, or perhaps by using heuristics (for instance,
URLs containing session ids are often not bookmarkable.)

 - Godmar


Re: [CODE4LIB] C4L2011 Voting for Prepared Talks

2010-12-01 Thread Godmar Back
"through Dec 1" typically means until Dec 1, 23:59pm (in some time zone) -
yet the page says voting is closed.

Could this be fixed?

 - Godmar

On Mon, Nov 29, 2010 at 5:02 PM, McDonald, Robert H.
wrote:

> Just a reminder that voting for prepared talks for code4lib 2011 is ongoing
> and open through Dec 1, 2010.
>
> Please vote if you have not done so already.
>
> To vote - go here - http://vote.code4lib.org/election/index/17
>
> If you have never voted before you will need to register here first -
> http://code4lib.org/user/register
>
> Thanks
>
> Robert
>
> **
> Robert H. McDonald
> Associate Dean for Library Technologies and Digital Libraries
> Associate Director, Data to Insight Center-Pervasive Technology Institute
> Executive Director, Kuali OLE
> Frye Leadership Institute Fellow 2009
> Indiana University
> Herman B Wells Library 234
> 1320 East 10th Street
> Bloomington, IN 47405
> Phone: 812-856-4834
> Email: rob...@indiana.edu
> Skype/GTalk: rhmcdonald
> AIM/MSN: rhmcdonald1
>


[CODE4LIB] Q: Summon API Service?

2010-10-27 Thread Godmar Back
Hi,

Unlike Link/360, Serials Solution's Summon API is extremely cumbersome to
use - requiring, for instance, that requests be digitally signed. (*)

Has anybody developed a proxy server for Summon that makes its API public
(e.g. receives requests, signs them, forwards them to Summon, and relays the
result back to a HTTP client?)

Serials Solutions publishes some PHP5 and Ruby sample code in two API
libraries (**), but these don't appear to be fully fledged nor
easy-to-install solutions.  (Easy to install here is defined as an average
systems librarian can download them, provide the API key, and have a running
solution in less time than it takes to install Wordpress.)

Thanks!

 - Godmar

(*) http://api.summon.serialssolutions.com/help/api/authentication
(**) http://api.summon.serialssolutions.com/help/api/code


Re: [CODE4LIB] Safari extensions

2010-08-06 Thread Godmar Back
On Fri, Aug 6, 2010 at 8:19 AM, Joel Marchesoni  wrote:
> Honestly I try to switch to Chrome every month or so, but it just doesn't do 
> what Firefox does for me. I've actually been using a Firefox mod called Pale 
> Moon [1] that takes out some of the not so useful features for work (parental 
> controls, etc) and optimizes for current processors. It's not a huge speed 
> increase, but it is definitely noticeable.
>

Chrome is certainly behind Firefox in its extension capability. For
instance, it doesn't allow the extension of context menus yet (planned
for later this year or next), and even the planned API will be less
flexible than Firefox's  . It is hobbled by the fact that the browser
is not itself written using the same markup language as its
extensions, so Google's programmers have to add an API (along with a
C++ implementation) for every feature they want supported.

Regarding the JavaScript performance, both Firefox and Chrome have
just-in-time compilers in their engines (Chrome uses V8, Firefox uses
TraceMonkey), which each provide an order or two of magnitudes speedup
compared to interpreters that were used in FF 3.0 and before.

Regarding resource usage, it's difficult to tell. Firefox is certainly
a memory hog, with internal memory leaks, but when the page itself is
the issue (perhaps because the JavaScript programmer leaked memory),
then both browsers are affected. In Chrome, I've observed two
problems. First, if a page leaks, then the corresponding tab will
simply ask for more memory from the OS. There are no resource controls
at this point. The effect is the same as in Firefox. Second, each page
is scheduled separately by the OS. I've observed that Chrome tabs slow
to a halt in Windows XP because the OS is starving a tab's thread if
there are CPU-bound activities on the machine, making Chrome actually
very difficult to use.

 - Godmar


Re: [CODE4LIB] Safari extensions

2010-08-05 Thread Godmar Back
On Thu, Aug 5, 2010 at 4:15 PM, Raymond Yee  wrote:
> Has anyone given thought to how hard it would be to port Firefox extensions
> such as LibX and  Zotero to Chrome or Safari?  (Am I the only one finding
> Firefox to be very slow compared to Chrome?)

We have ported LibX to Chrome, see http://libx.org/releases/gc/

Put briefly, Chrome provides an extension API that is entirely
JavaScript/HTML based. As such, existing libraries such as jQuery can
be used to implement the extensions' user interface (such as LibX's
search box, implemented as a browser action). Unlike Firefox, no
coding in a special-purpose user interface markup language such as XUL
is required. (That said, it's possible to achieve the same in Firefox,
and in fact we're now using the same HTML/JS code in Firefox, reducing
the XUL-specific to a minimum). Safari will use the same approach.

Chrome also supports content scripts that interact with the page a
user is looking at. These scripts live in an environment that is
similar to the environment seen by client-side code coming from the
origin. In this sense, it's very similar to how Firefox works with its
sandboxes, with the exception mentioned in my previous email that all
communication outside has to be done via message passing (sending
JSON-encoded objects back and forth).

 - Godmar


Re: [CODE4LIB] Safari extensions

2010-08-05 Thread Godmar Back
No, nothing beyond a quick read-through.

The architecture is similar to Google Chrome's - which is perhaps not
surprising given that both Safari and Chrome are based on WebKit -
which for us at LibX means we should be able to leverage the redesign
we did for LibX 2.0.

A notable characteristic of this architecture is that content scripts
that interact with a page are in a separate OS process from the "main"
extensions' code, thus they have to communicate with the main
extension via message passing rather than by exploiting direct method
calls as in Firefox.

 - Godmar

On Thu, Aug 5, 2010 at 4:04 PM, Eric Hellman  wrote:
> Has anyone played with the new Safari extensions capability? I'm looking at 
> you, Godmar.
>
>
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
>
> e...@hellman.net
> http://go-to-hellman.blogspot.com/
> @gluejar
>


Re: [CODE4LIB] SerSol 360Link API?

2010-04-19 Thread Godmar Back
I wrote to-JSON proxy a while ago:
http://libx.lib.vt.edu/services/link360/index.html

I found the Link360 doesn't handle load very well. Even a small burst of
requests leads to a spike in latency and error responses. I ask SS if this
was a bug or part of some intentional throttling attempt, but never received
a reply. Didn't pursue it further.

 - Godmar

On Mon, Apr 19, 2010 at 2:42 AM, David Pattern wrote:

> Hiya
>
> We're using it to add e-holdings into to our OPAC, e.g.
> http://library.hud.ac.uk/catlink/bib/396817/
>
> I've also tried using the API to add the coverage info to the
> "availability" text for journals in Summon (e.g. "Availability: print
> (1998-2005) & electronic (2000-present)").
>
> I've made quite a few tweaks to our 360 Link (mostly using jQuery), so I'm
> half tempted to have a go using the API to develop a complete replacement
> for 360 Link.  If anyone's already done that, I'd be keen to hear more.
>
> regards
> Dave Pattern
> University of Huddersfield
>
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan
> Rochkind [rochk...@jhu.edu]
> Sent: 19 April 2010 03:50
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] SerSol 360Link API?
>
> Is anyone using the SerSol 360Link API in a real-world production or
> near-production application?  If so, I'm curious what you are using it for,
> what your experiences have been, and in particular if you have information
> on typical response times of their web API.  You could reply on list or off
> list just to me. If I get interesting information especially from several
> sources, I'll try to summarize on list and/or blog either way.
>
> Jonathan
>
>
> ---
> This transmission is confidential and may be legally privileged. If you
> receive it in error, please notify us immediately by e-mail and remove it
> from your system. If the content of this e-mail does not relate to the
> business of the University of Huddersfield, then we do not endorse it and
> will accept no liability.
>


Re: [CODE4LIB] Conference followup; open position at Google Cambridge

2010-03-12 Thread Godmar Back
On Fri, Mar 12, 2010 at 4:58 PM, Emily Lynema  wrote:

> We'd really like to use the cover images from Google in various catalog
> tools, but have noticed in the past that the only cover image info provided
> in the Google Books API is for the small thumbnails. It would be nice to
> also provide links to the other image sizes available, if there are any.
>
>
Though not sanctioned by Google, this JavaScript replacement expression
leads from the thumbnail urls provided by the Google book API to a url
pointing at a larger cover image:

 imgurl = imgurl.replace(/&zoom=5&/, "&zoom=1&").replace(/&pg=PP1&/,
"&printsec=frontcover&");

 - Godmar


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Godmar Back
Thanks for the Internet Archive pointer. Hadn't thought of it (probably
because of a few past unsuccessful attempts to find archived pages.)

Tried BadgerFish (
http://libx.lib.vt.edu/services/code4lib/lccnrelay3/2004022563 which proxies
lccn.loc.gov's marcxml) and it meets the requirements of faithful
reproduction of the XML, albeit in a very verbose way that doesn't attempt
to do any minimization.

That leaves, indeed, two independent problems:

a) a free converter to GData's JSON format or another less redundant
convention than badgerfish. Looking at this for 1 second, I'm wondering if
this is even possible to implement without knowing the schema of the XML
document. It says, for instance, to use arrays [] for elements that may
occur more than once.

b) something MARC-specific to express MARC records in JSON. I talked to
Nathan Trail from LOC at code4lib, and they're revamping their lccn server
this year to scale up and also serve more formats. Presumably, this effort
could lead to a de-facto standard of how to serve MARC in JSON.

Thinking out loud about this for a minute, I'm wondering if part a) is
really a worthwhile goal. Aside from the impromptu prototyping of an
XML-to-JSON gateway, I don't see any production use for a XML to JSON
converter that is agnostic to the schema; for performance reasons alone.

 - Godmar


Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Godmar Back
On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer wrote:

> Hi,
> try this: http://code.google.com/p/xml2json-xslt/
>
>
I should have mentioned that I already tried everything I could find after
googling - this stylesheet doesn't meet the requirements, not by far. It
drops attributes just like simplexml_json does.

The one thing I didn't try is a program called 'BadgerFish.php' which I
couldn't locate - Google once indexed it at badgerfish.ning.com

 - Godmar


[CODE4LIB] Q: XML2JSON converter

2010-03-04 Thread Godmar Back
Hi,

Can anybody recommend an open source XML2JSON converter in PhP or
Python (or potentially other languages, including XSLT stylesheets)?

Ideally, it should implement one of the common JSON conventions, such
as Google's JSON convention for GData [1], but anything that preserves
all elements, attributes, and text content of the XML file would be
acceptable.

Note that json_encode(simplexml_load_file(...)) does not meet this
requirement - in fact, nothing based on simplexml_load_file() will.
(It can't even load MarcXML correctly).

Thanks!

 - Godmar

[1] http://code.google.com/apis/gdata/docs/json.html


Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-19 Thread Godmar Back
On Tue, Jan 19, 2010 at 10:09 AM, Sean Hannan  wrote:
> I've had the best experience (query speed, primarily) with BaseX.  This was 
> primarily for large XML document processing, so I'm not sure how much it will 
> satisfy your transactional needs.
>
> I was initially using eXist, and then switched over to BaseX because the 
> speed gains were very noticeable.
>

What about the relative maturity/functionality of eXist vs BaseX? I'm
a bit skeptical to put my eggs in a University project basket not
backed by a continuous revenue stream (... did I just say that out
loud?)

 - Godmar


[CODE4LIB] Q: what is the best open source native XML database

2010-01-16 Thread Godmar Back
Hi,

we're currently looking for an XML database to store a variety of
small-to-medium sized XML documents. The XML documents are
unstructured in the sense that they do not follow a schema or DTD, and
that their structure will be changing over time. We'll need to do
efficient searching based on elements, attributes, and full text
within text content. More importantly, the documents are mutable.
We'll like to bring documents or fragments into memory in a DOM
representation, manipulate them, then put them back into the database.
Ideally, this should be done in a transaction-like manner. We need to
efficiently serve document fragments over HTTP, ideally in a manner
that allows for scaling through replication. We would prefer strong
support for Java integration, but it's not a must.

Have other encountered similar problems, and what have you been using?

So far, we're researching: eXist-DB (http://exist.sourceforge.net/ ),
Base-X (http://www.basex.org/ ), MonetDB/XQuery
(http://www.monetdb.nl/XQuery/ ), Sedna
(http://modis.ispras.ru/sedna/index.html ). Wikipedia lists a few
others here: http://en.wikipedia.org/wiki/XML_database
I'm wondering to what extent systems such as Lucene, or even digital
object repositories such as Fedora could be coaxed into this usage
scenario.

Thanks for any insight you have or experience you can share.

 - Godmar


Re: [CODE4LIB] ipsCA Certs

2010-01-04 Thread Godmar Back
Hi,

in my role as unpaid tech advisor for our local library, may I ask a
question about the ipsCA issue?

Is my understanding correct that ipsCA currently reissues certificates [1]
signed with a root CA that is not yet in Mozilla products, due to IPS's
delaying the necessary vetting process [2]? In other words, Mozilla users
would see security warnings even if a reissued certificate was used?

The reason I'm confused is that I, like David, saw a number of still valid
certificates from "IPS Internet publishing Services s.l." already shipping
with Firefox, alongside the now-expired certificate. But I suppose those
certificates are for something else and the reissued certificates won't be
signed using them?

Thanks,

 - Godmar

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=529286
[1] http://certs.ipsca.com/Support/hierarchy-ipsca.asp

On Thu, Dec 17, 2009 at 4:02 PM, John Wynstra  wrote:

> Out of curiosity, did anyone else using ipsCA certs receive notification
> that due to the coming expiration of their root CA (December 29,2009), they
> would need a reissued cert under a new root CA?
>
> I am uncertain as to how this new Root CA will become a part of the
> browsers trusted roots without some type of user action including a software
> upgrade, but the following library website instructions lead me to believe
> that this is not going to be smooth.  http://bit.ly/53Npel
>
> We are just about to go live with EZProxy in January with an ipsCA cert
> issued a few months ago, and I am not about to do that if I have serious
> browser support issue.
>
>
> --
> <><><><><><><><><><><><><><><><><><><>
> John Wynstra
> Library Information Systems Specialist
> Rod Library
> University of Northern Iowa
> Cedar Falls, IA  50613
> wyns...@uni.edu
> (319)273-6399
> <><><><><><><><><><><><><><><><><><><>
>


Re: [CODE4LIB] Character problems with tictoc

2009-12-21 Thread Godmar Back
On Mon, Dec 21, 2009 at 2:09 PM, Glen Newton  wrote:
>
> The file I got with wget is:
>  http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt
>

(Just to convince myself I'm not going nuts...) - this file, which
Glen downloaded with wget, appears double-encoded:

# curl -s http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt | od -a -t x1
| head -1082 | tail -4
0020660   -   3   6   8   2  nl   2   2   1  ht   A   c   t   a  sp   O
2d 33 36 38 32 0a 32 32 31 09 41 63 74 61 20 4f
0020700   r   t   o   p   C etx   B   )   d   i   c   a  sp   B   r   a
72 74 6f 70 c3 83 c2 a9 64 69 63 61 20 42 72 61

 - Godmar


Re: [CODE4LIB] Character problems with tictoc

2009-12-21 Thread Godmar Back
I believe they've changed it while we were having the discussion.

When I downloaded the file (with curl), it looked like this:

0020700   r   t   o   p   C etx   B   )   d   i   c   a  sp   B   r   a
72 74 6f 70 c3 83 c2 a9 64 69 63 61 20 42 72 61
0020720   s   i   l   e   i   r   a  ht   h   t   t   p   :   /   /   w
73 69 6c 65 69 72 61 09 68 74 74 70 3a 2f 2f 77

 - Godmar

On Mon, Dec 21, 2009 at 2:24 PM, Erik Hetzner  wrote:
> At Mon, 21 Dec 2009 14:09:28 -0500,
> Glen Newton wrote:
>>
>> It seems that different people are seeing different things in their
>> respective viewers (i.e some are OK and others are like what I am
>> seeing).
>>
>> When I use wget and view the local file in Firefox (3.0.4, Linux Suse
>> 11.0) I see:
>>  http://cuvier.cisti.nrc.ca/~gnewton/tictoc1.gif
>> [gif used as it is not lossy]
>>
>> The text is clearly not correct.
>>
>> The file I got with wget is:
>>   http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt
>>
>> Is this just a question of different client software (and/or OSes)
>> viewing or mangling the content?
>
> When dealing with character set issues (especially the dreaded
> double-encoding!) I find it best to use hex editors or dumpers. If in
> emacs, try M-x hexl-find-file. On a Unix command line, the od or hd
> commands are useful.
>
> For the record:
>
>   48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d  |HTTP/1.1 200 OK.|
> 0010  0a 44 61 74 65 3a 20 4d  6f 6e 2c 20 32 31 20 44  |.Date: Mon, 21 D|
> 0020  65 63 20 32 30 30 39 20  31 39 3a 32 32 3a 33 38  |ec 2009 19:22:38|
> 0030  20 47 4d 54 0d 0a 53 65  72 76 65 72 3a 20 41 70  | GMT..Server: Ap|
> 0040  61 63 68 65 2f 32 2e 32  2e 31 33 20 28 55 6e 69  |ache/2.2.13 (Uni|
> 0050  78 29 20 6d 6f 64 5f 73  73 6c 2f 32 2e 32 2e 31  |x) mod_ssl/2.2.1|
> 0060  33 20 4f 70 65 6e 53 53  4c 2f 30 2e 39 2e 38 6b  |3 OpenSSL/0.9.8k|
> 0070  20 50 48 50 2f 35 2e 33  2e 30 20 44 41 56 2f 32  | PHP/5.3.0 DAV/2|
> 0080  0d 0a 58 2d 50 6f 77 65  72 65 64 2d 42 79 3a 20  |..X-Powered-By: |
> 0090  50 48 50 2f 35 2e 33 2e  30 0d 0a 43 6f 6e 74 65  |PHP/5.3.0..Conte|
> 00a0  6e 74 2d 54 79 70 65 3a  20 74 65 78 74 2f 70 6c  |nt-Type: text/pl|
> 00b0  61 69 6e 3b 20 63 68 61  72 73 65 74 3d 75 74 66  |ain; charset=utf|
> 00c0  2d 38 0d 0a 54 72 61 6e  73 66 65 72 2d 45 6e 63  |-8..Transfer-Enc|
> 00d0  6f 64 69 6e 67 3a 20 63  68 75 6e 6b 65 64 0d 0a  |oding: chunked..|
> ...
> 2230  4f 72 74 68 6f 70 61 65  64 69 63 61 09 68 74 74  |Orthopaedica.htt|
> 2240  70 3a 2f 2f 69 6e 66 6f  72 6d 61 68 65 61 6c 74  |p://informahealt|
> 2250  68 63 61 72 65 2e 63 6f  6d 2f 61 63 74 69 6f 6e  |hcare.com/action|
> 2260  2f 73 68 6f 77 46 65 65  64 3f 6a 63 3d 6f 72 74  |/showFeed?jc=ort|
> 2270  26 74 79 70 65 3d 65 74  6f 63 26 66 65 65 64 3d  |&type=etoc&feed=|
> 2280  72 73 73 09 31 37 34 35  2d 33 36 37 34 09 31 37  |rss.1745-3674.17|
> 2290  34 35 2d 33 36 38 32 0a  32 32 31 09 41 63 74 61  |45-3682.221.Acta|
> 22a0  20 4f 72 74 6f 70 c3 a9  64 69 63 61 20 42 72 61  | Ortop..dica Bra|
> 22b0  73 69 6c 65 69 72 61 09  68 74 74 70 3a 2f 2f 77  |sileira.http://w|
> ...
>
> best,
> Erik Hetzner
>
> ;; Erik Hetzner, California Digital Library
> ;; gnupg key id: 1024D/01DB07E3
>
>


Re: [CODE4LIB] Character problems with tictoc

2009-12-21 Thread Godmar Back
The string in question is double-encoded, that is, a string that's in
UTF-8 already was run through a UTF-8 encoder.

The string is "Acta Ortopedica" where the 'e' is really '\u00e9' aka
'Latin Small Letter E with Acute'. [1]

In UTF-8, the e-acute is two-byte encoded as C3 A9.  If you run the
bytes C3 A9 through a UTF-8 encoder, C3 ('\u00c3' - Capital A with
tilde) becomes C3 83 and A9 (copyright sign, '\u00a9' becomes C2 A9).
C3 83 C2 A9 is exactly what JISC is serving, what it should be serving
is C3 A9.

Send email to them.

 - Godmar

[1] http://www.utf8-chartable.de/

2009/12/21 Glen Newton 
>
> [I realise there was a recent related 'Character-sets for dummies'[1]
> discussion recently]
>
> I am using tictocs[2] list of journal RSS feeds, and I am getting
> gibberish in places for diacritics. Below is an example:
>
> in emacs:
>  221    Acta Ortop  dica Brasileira     
> http://www.scielo.br/rss.php?pid=1413-7852&lang=en      1413-7852
> in Firefox:
>  221    Acta Ortop  dica Brasileira     
> http://www.scielo.br/rss.php?pid=1413-7852&lang=en      1413-7852
>
> Note that the emacs view is both of a save of the Firefox, and from a
> direct download using 'wget'.
>
> Is this something on my end, or are the tictocs people not serving
> proper UTF-8?
>
> The HTTP header from wget claims UTF-8:
> > wget -S http://www.tictocs.ac.uk/text.php
> > --2009-12-21 12:47:59--  http://www.tictocs.ac.uk/text.php
> > Resolving www.tictocs.ac.uk... 130.88.101.131
> > Connecting to www.tictocs.ac.uk|130.88.101.131|:80... connected.
> > HTTP request sent, awaiting response...
> >   HTTP/1.1 200 OK
> >   Date: Mon, 21 Dec 2009 17:42:05 GMT
> >   Server: Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8k PHP/5.3.0 DAV/2
> >   X-Powered-By: PHP/5.3.0
> >   Content-Type: text/plain; charset=utf-8
> >   Connection: close
> > Length: unspecified [text/plain]
> >
>
> Can someone validate if they are also experiencing this issue?
>
> Thanks,
> Glen
>
> [1]https://listserv.nd.edu/cgi-bin/wa?S2=CODE4LIB&q=&s=character-sets+for+dummies&f=&a=&b=
> [2]http://www.tictocs.ac.uk/text.php
>
> --
> Glen Newton | glen.new...@nrc-cnrc.gc.ca
> Researcher, Information Science, CISTI Research
> & NRC W3C Advisory Committee Representative
> http://tinyurl.com/yvchmu
> tel/t l: 613-990-9163 | facsimile/t l copieur 613-952-8246
> Canada Institute for Scientific and Technical Information (CISTI)
> National Research Council Canada (NRC)| M-55, 1200 Montreal Road
> http://www.nrc-cnrc.gc.ca/
> Institut canadien de l'information scientifique et technique (ICIST)
> Conseil national de recherches Canada | M-55, 1200 chemin Montr al
> Ottawa, Ontario K1A 0R6
> Government of Canada | Gouvernement du Canada
> --


Re: [CODE4LIB] SerialsSolutions Javascript Question

2009-10-28 Thread Godmar Back
On Wed, Oct 28, 2009 at 9:49 PM, Michael Beccaria
wrote:

> I should clarify. The most granular piece of information in the html is
> a "class" attribute (i.e. there is no "id"). Here is a snippet:
>
> 
> 
> Annals of forest
> science.  class="SS_JournalISSN">(1286-4560)
>
>
> I want to alter the "(1286-4560)"
> section. Maybe add some html after the issn that tells whether it is
> peer reviewed or not.
>
>
Yes - you'd write code similar to this one:

$(document).ready(function () {
   $("SS_JournalISSN").each(function () {
   var issn = $(this).text().replace(/[^\dxX]/g, "");
   var self = this;
   $.getJSON("http: xissn.oclc.issn=" + issn +
"&format=json&callback=.", function (data) {
 $(self).append(  data ... [ 'is peer reviewed' ] );
   });
   });
});

 - Godmar


Re: [CODE4LIB] Setting users google scholar settings

2009-07-15 Thread Godmar Back
It used to be you could just GET the corresponding form, e.g.:

http://scholar.google.com/scholar_setprefs?num=10&instq=&inst=sfx-f7e167eec5dde9063b5a8770ec3aaba7&q=einstein&inststart=0&submit=Save+Preferences

 - Godmar

On Wed, Jul 15, 2009 at 3:17 AM, Stuart Yeates wrote:
> It's possible to send users to google scholar using URLs such as:
>
> http://scholar.google.co.nz/schhp?hl=en&inst=8862113006238551395
>
> where the institution is obtained using the standard preference setting 
> mechanism. Has anyone found a way of persisting this setting in the users 
> browser, so when they start a new session this is the default?
>
> Yes, I know they can go "Scholar Preferences" -> "Save" to persist it, but 
> I'm looking for a more automated way of doing it...
>
> cheers
> stuart
>


Re: [CODE4LIB] tricky mod_rewrite

2009-07-01 Thread Godmar Back
On Wed, Jul 1, 2009 at 10:38 AM, Walker, David  wrote:

> > They can create .htaccess files, but don't always
> > have control of the main Apache httpd.conf or the
> > root directory.
>
> Just to be clear, I didn't mean just the root directory itself.  If
> .htacess lives within a sub-directory of the Apache root, then you _don't_
> need RewriteBase.
>
> RewriteBase is only necessary when you're in a virtual directory, which is
> physically located outside of Apache's DocumentRoot path.
>
> Correct me if I'm wrong.
>

You are correct!  If I omit the RewriteBase, it still works in this case.

Let's have some more of that sendmail koolaid and up the challenge.

How can I write an .htaccess that's path-independent if I like to exclude
certain files in that directory, such as index.html?  So far, I've been
doing:

RewriteCond %{REQUEST_URI} !^/services/tictoclookup/standalone/index.html

To avoid running my script for index.html.  How would I do that?  (Hint: the
use of SERVER variables on the right-hand side in the CondPattern of a
RewriteCond is not allowed, but some trickery may be possible, according to
http://www.issociate.de/board/post/495372/Server-Variables_in_CondPattern_of_RewriteCond_directive.html)

 - Godmar


Re: [CODE4LIB] tricky mod_rewrite

2009-07-01 Thread Godmar Back
On Wed, Jul 1, 2009 at 10:18 AM, Walker, David  wrote:

> > Is it possible to write a .htaccess file that works
> > *no matter* where it is located
>
> I don't believe so.
>
> If the .htaccess file lives in a directory inside of the Apache root
> directory, then you _don't_ need to specify a RewriteBase.  It's really only
> necessary when .htacess lives in a virtual directory outside of the Apache
> root.
>

I see.

Unfortunately, that's the common deployment case by non-administrators (many
librarians). They can create .htaccess files, but don't always have control
of the main Apache httpd.conf or the root directory.

 - Godmar


Re: [CODE4LIB] tricky mod_rewrite

2009-07-01 Thread Godmar Back
On Wed, Jul 1, 2009 at 9:13 AM, Peter Kiraly  wrote:

> From: "Godmar Back" 
>
>> is it possible to write this without hardwiring the RewriteBase in it?  So
>> that it can be used, for instance, in an .htaccess file from within any
>> /path?
>>
>
> Yes, you can put it into a .htaccess file, and the URL rewrite will
> apply on that directory only.
>

You misunderstood the question; let me rephrase it:

Can I write a .htaccess file without specifying the path where the script
will be located in RewriteBase?
For instance, consider
http://code.google.com/p/tictoclookup/source/browse/trunk/standalone/.htaccess
Here, anybody who wishes to use this code has to adapt the .htaccess file to
their path and change the "RewriteBase" entry.

Is it possible to write a .htaccess file that works *no matter* where it is
located, entirely based on where it is located relative to the Apache root
or an Apache directory?

 - Godmar


Re: [CODE4LIB] tricky mod_rewrite

2009-07-01 Thread Godmar Back
On Wed, Jul 1, 2009 at 4:58 AM, Peter Kiraly  wrote:

> Hi Eric,
>
> try this:
>
> 
>  RewriteEngine on
>  RewriteBase /script
>  RewriteCond %{REQUEST_FILENAME} !-f
>  RewriteCond %{REQUEST_FILENAME} !-d
>  RewriteCond %{REQUEST_URI} !=/favicon.ico
>  RewriteRule ^(.*)$ script.cgi?param1=$1 [L,QSA]
> 
>

Here's a challenge question:

is it possible to write this without hardwiring the RewriteBase in it?  So
that it can be used, for instance, in an .htaccess file from within any
/path?

  - Godmar


Re: [CODE4LIB] How to access environment variables in XSL

2009-06-23 Thread Godmar Back
Let me repeat a small comment I already sent to Mike in private email:
in a J2EE environment, information that characterizes a request (such as
path, remote addr, etc.) is not accessible in environment variables or
properties, unlike in a CGI environment. That means that even if you write
an extension for XALAN-J to trigger the execution of your Java code while
processing a stylesheet during a request, you don't normally obtain access
to this information. Rather it is passed by the servlet container to the
servlet via a request object. If you don't control the servlet code - say
because it's vendor-provided - then you have to either rely on any extension
functionality the vendor may provide, or you have to create your own servlet
that wraps the vendor's servlet, saving the request information somewhere
where your xalan extension can retrieve it, then forwards the request to the
vendor's servlet.

 - Godmar


On Tue, Jun 23, 2009 at 2:04 PM, Cloutman, David
wrote:

> I'm in a similar situation in that I've spent the last 6 months cramming
> XSLT in order to do output from an application provided by a vendor. In
> my situation, I'm taking information stored in a CMS database as XML
> fragments and transforming it into our Web site's pages. (The CMS is
> called Cascade, and is okay, but not fantastic.)
>
> The tricky part of this situation is that simply grabbing a book on
> XPath and XSLT will not tell you everything you need to know in order to
> work with your proprietary software. Neither will simply knowing what
> language the middleware layer is written in. Specifically, you need to
> find out from your vendor what XSLT processor their application. In my
> case, I found out that my CMS uses Xalan, which impacts my situation
> significantly, since it limits me to XSLT 1.0. However, the Xalan
> processor does allow for one to script extensions, and in my case I
> _might_ be able to leverage that fact to access some system information,
> depending on what capabilities my vendor has given me. So, in short,
> making the most of the development environment you have in creating your
> XSLT will require you not only to grok the complexities of what I think
> is a rather difficult language to master, but also to gain a good
> understanding of what tools are and are not available to you through
> your preprocessor.
>
> Just to address your original question, XSLT really is not designed to
> work like a conventional programming language per-se. You may or may not
> have direct access to environment variables. That is dependent upon how
> the XSLT processor is implemented by your vendor. I did see some
> creative ideas in other posts, and I do not know if they will or will
> not work. However, it is often possible for the middleware layer to pass
> data to the XSLT processor, thus exposing it to the XSLT developer.
> However, what data gets passed to the XSLT developer is generally under
> the control of the application developer.
>
> Here is a quick example of how XML data and XSLT presentation logic can
> be glued together in PHP using a non-native XSLT processor. This is
> being done similarly by our respective Java applications, using
> different XSLT processors, and hopefully a lot more error checking.
>
> http://frenzy.marinlibrary.org/code-samples/php-xslt/middleware.php
>
> In the example, I have passed some environment data to the XSLT
> processor from the PHP middleware layer. As you will see, what data is
> exposed is entirely determined by the PHP.
>
> Good luck!
>
> - David
>
> ---
> David Cloutman 
> Electronic Services Librarian
> Marin County Free Library
>
> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Doran, Michael D
> Sent: Friday, June 19, 2009 2:53 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] How to access environment variables in XSL
>
>
> Hi Dave,
>
> > What XSLT processor and programming language are you using?
>
> I'm embarrassed to say that I'm not sure.  I'm making modifications and
> enhancements to already existing XSL pages that are part of the
> framework of Ex Libris' new Voyager 7.0 OPAC.  This new version of the
> OPAC is running under Apache Tomcat (on Solaris) and my assumption is
> that the programming language is Java; however the source code for the
> app itself is not available to me (and I'm not a Java programmer anyway,
> so it's a moot point).  I assume also that the XSLT processor is what
> comes with Solaris (or Tomcat?).  As you can probably tell, this stuff
> is new to me.  I've been trying to take a Sun Ed XML/XSL class for the
> last year, but it keeps getting cancelled for lack of students.
> Apparently I'm the last person left in the Dallas/Fort Worth area that
> needs to learn this stuff. ;-)
>
> -- Michael
>
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # do...@uta.edu
> # http://rocky.uta.edu/doran/
>
>
> > -

Re: [CODE4LIB] How to access environment variables in XSL

2009-06-19 Thread Godmar Back
Running in a J2EE is somewhat different from running in a CGI environment.
Specifically, variables such as REMOTE_ADDR, etc. are not stored in
environment variables that are easily accessible.

Assuming that your XSLT is executed for each request (which, btw, is not a
given since Voyager may well be caching the results of the style-sheet
application), your vendor may set up the XSLT processor environment to
provide access to variables related to the current request, for instance,
via XALAN-J extensions. If they did that, it would probably be in the
documentation to which you have access under NDA.

If not, things will be a lot more complicated. You'll probably have to wrap
the servlet in your own; store the current servlet request in a thread-local
variable, then create an xalan extension to access it during the XSLT
processing. That requires a fair bit of Java/J2EE "trickery," but is
definitely possible (and will likely void your warranty.)

 - Godmar

On Fri, Jun 19, 2009 at 9:42 PM, Tom Pasley  wrote:

> Hi,
>
> I see Michael's here too - (he's a bit of a guru on the Voyager-L listserv
> :-D).
>
> Michael, if you have a look at the Vendor URL, there's some info there, but
> you might also try having a look through some of these G.search results:
>
> site:xml.apache.org inurl:"xalan-j" "system"
>
> - see if that helps any - like to help more, but I've got to go!
>
> Tom
>
> On Sat, Jun 20, 2009 at 10:11 AM, Doran, Michael D  wrote:
>
> > Hi Jon,
> >
> > > Try putting somewhere in one of the xslt pages
> >
> > Cool!  Here's the output:
> >
> >Version: 1
> >Vendor: Apache Software Foundation
> >Vendor URL: http://xml.apache.org/xalan-j
> >
> > -- Michael
> >
> > # Michael Doran, Systems Librarian
> > # University of Texas at Arlington
> > # 817-272-5326 office
> > # 817-688-1926 mobile
> > # do...@uta.edu
> > # http://rocky.uta.edu/doran/
> >
> >
> > > -Original Message-
> > > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On
> > > Behalf Of Jon Gorman
> > > Sent: Friday, June 19, 2009 5:05 PM
> > > To: CODE4LIB@LISTSERV.ND.EDU
> > > Subject: Re: [CODE4LIB] How to access environment variables in XSL
> > >
> > > Try putting somewhere in one of the xslt pages
> > >
> > > 
> > > Version:
> > > 
> > > 
> > > Vendor:
> > > 
> > > 
> > > Vendor URL:
> > > 
> > > 
> > >
> > > Jon
> > >
> > > On Fri, Jun 19, 2009 at 4:53 PM, Doran, Michael
> > > D wrote:
> > > > Hi Dave,
> > > >
> > > >> What XSLT processor and programming language are you using?
> > > >
> > > > I'm embarrassed to say that I'm not sure.  I'm making
> > > modifications and enhancements to already existing XSL pages
> > > that are part of the framework of Ex Libris' new Voyager 7.0
> > > OPAC.  This new version of the OPAC is running under Apache
> > > Tomcat (on Solaris) and my assumption is that the programming
> > > language is Java; however the source code for the app itself
> > > is not available to me (and I'm not a Java programmer anyway,
> > > so it's a moot point).  I assume also that the XSLT processor
> > > is what comes with Solaris (or Tomcat?).  As you can probably
> > > tell, this stuff is new to me.  I've been trying to take a
> > > Sun Ed XML/XSL class for the last year, but it keeps getting
> > > cancelled for lack of students.  Apparently I'm the last
> > > person left in the Dallas/Fort Worth area that needs to learn
> > > this stuff. ;-)
> > > >
> > > > -- Michael
> > > >
> > > > # Michael Doran, Systems Librarian
> > > > # University of Texas at Arlington
> > > > # 817-272-5326 office
> > > > # 817-688-1926 mobile
> > > > # do...@uta.edu
> > > > # http://rocky.uta.edu/doran/
> > > >
> > > >
> > > >> -Original Message-
> > > >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On
> > > >> Behalf Of Walker, David
> > > >> Sent: Friday, June 19, 2009 2:48 PM
> > > >> To: CODE4LIB@LISTSERV.ND.EDU
> > > >> Subject: Re: [CODE4LIB] How to access environment variables in XSL
> > > >>
> > > >> Micahael,
> > > >>
> > > >> What XSLT processor and programming language are you using?
> > > >>
> > > >> --Dave
> > > >>
> > > >> ==
> > > >> David Walker
> > > >> Library Web Services Manager
> > > >> California State University
> > > >> http://xerxes.calstate.edu
> > > >> 
> > > >> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf
> > > >> Of Doran, Michael D [do...@uta.edu]
> > > >> Sent: Friday, June 19, 2009 12:44 PM
> > > >> To: CODE4LIB@LISTSERV.ND.EDU
> > > >> Subject: [CODE4LIB] How to access environment variables in XSL
> > > >>
> > > >> I am working with some XSL pages that serve up HTML on the
> > > >> web.  I'm new to XSL.   In my prior web development, I was
> > > >> accustomed to being able to access environment variables (and
> > > >> their values, natch) in my CGI scripts and/or via Server Side
> > > >> Includes.  Is there an equivalent mechanism for accessing
> > > >> those environment variables within

Re: [CODE4LIB] FW: [CODE4LIB] Newbie asking for some suggestions with javascript

2009-06-15 Thread Godmar Back
On Mon, Jun 15, 2009 at 4:09 PM, Roy Tennant  wrote:

> It is worth following up on Xiaoming's statement of a limit of 100 uses per
> day of the xISSN service with the information that exceptions to this
> limite
> are certainly granted. Annette probably knows that just such an exception
> was granted to her LibX project, and LibX remains the single largest user
> of
> this service.
> Roy


Yes, Roy is correct.

We are very grateful for OCLC's generous support and would like to
acknowledge that publicly.

FWIW, I suggested the inclusion of ticTOCs RSS feed data in the survey OCLC
sent out two weeks ago, and less than a week later, OCLC rolls out the
improved service. Excellent!

[ As an aside, in LibX, we are changing the way we use the service;
previously, we were looking up all ISSNs on any page a user visits; we are
now retrieving the metadata if the user actually hovers over the link. Not
that OCLC complained - but CrossRef did when they noticed > 100,000 hits per
day against their service for DOI metadata lookups. In fairness to CrossRef,
they are working on beefing up their servers as well. ]

 - Godmar & Annette for Team LibX.


Re: [CODE4LIB] Newbie asking for some suggestions with javascript

2009-06-11 Thread Godmar Back
Yes - see this email
http://serials.infomotions.com/code4lib/archive/2009/200905/0909.html

If you can host yourself, the stand-alone version is efficient and easy to
keep up to date - just run a cronjob that downloads the text file from JISC.
My WSGI script will automatically pick up if it has changed on disk.

 - Godmar

On Thu, Jun 11, 2009 at 4:08 PM, Annette Bailey  wrote:

> Godmar Back wrote a web service in python for ticTOC with an eye to
> incorporating links into III's Millennium catalog.
>
> http://code.google.com/p/tictoclookup/
>
> http://tictoclookup.appspot.com/
>
> Annette
>
> On Thu, Jun 11, 2009 at 12:34 PM, Derik Badman wrote:
> > Hello all,
> >
> > Just joined the list, and I'm hoping to get a suggestion or two.
> >
> > I'm working on using the ticTOCs ( http://www.tictocs.ac.uk/ ) text file
> of
> > rss feed urls for journals to insert links to those feeds in our Serials
> > Solution Journal Finder.
> >
> > I've got it working using a bit of jQuery.
> >
> > Demo here: http://155.247.22.22/badman/toc/demo.html
> > The javascript is here: http://155.247.22.22/badman/toc/toc-rss.js
> >
> > Getting that working wasn't too hard, but I'm a bit concerned about
> > efficiency and caching.
> >
> > I'm not sure the way I'm checking isbns against the text file is the most
> > efficient way to go. Basically I'm making an ajax call to the file that
> > takes the data and makes an array of objects. I then query the isbn of
> each
> > journal on the page against the array of objects. If there's a match I
> pull
> > the data and put it on the page. I'm wondering if there's a better way to
> do
> > this, especially since the text file is over 1mb. I'm not looking for
> code,
> > just ideas.
> >
> > I'm also looking for any pointers about using the file itself and somehow
> > auto-downloading it to my server on a regular basis. Right now I just
> saved
> > a copy to my server, but in the future it'd be good to automate grabbing
> the
> > file from ticTOCs server on a regular basis and updating the one on my
> > server (perhaps I'd need to use a cron job to do that?).
> >
> > Thanks for much for any suggestions or pointers. (For what it's worth, I
> can
> > manage with javascript or php.)
> >
> >
> > --
> > Derik A. Badman
> > Digital Services Librarian
> > Reference Librarian for Education and Social Work
> > Temple University Libraries
> > Paley Library 209
> > Philadelphia, PA
> > Phone: 215-204-5250
> > Email: dbad...@temple.edu
> > AIM: derikbad
> >
> > "Research makes times march forward, it makes time march backward, and it
> > also makes time stand still." -Greil Marcus
> >
>


Re: [CODE4LIB] A Book Grab by Google

2009-05-20 Thread Godmar Back
On Wed, May 20, 2009 at 8:42 PM, Karen Coyle  wrote:
>
> No, it's not uniquely Google, but adding another price pressure point to
> libraries is still seen as detrimental.
>

I'm sure you saw:
http://www.nytimes.com/2009/05/21/technology/companies/21google.html

"The new agreement, which Google hopes other libraries will endorse,
lets the University of Michigan object if it thinks the prices Google
charges libraries for access to its digital collection are too high, a
major concern of some librarians. Any pricing dispute would be
resolved through arbitration."

 - Godmar


Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes

2009-05-19 Thread Godmar Back
On Tue, May 19, 2009 at 8:26 AM, Boheemen, Peter van
 wrote:
> Clever idea to put the TicToc stuff 'in the cloud'. How are you going to
> keep it up-to-date ?

By periodically reuploading the entire set (which takes about 15-20
mins), new or changed records can be updated. A changed record is one
with a new RSS feed for the same ISSN + Title combination; the data is
keyed by ISSN+Title. This process can be optimized by only uploading
the delta (you upload .csv files, so the delta can be obtained easily
via comm(1)).

Removing records is a bit of a hassle since GAE does not provide an
easy-to-use interface for that. It's possible to wipe an entire table
clean by repeatedly deleting 500 records at a time (the entire set is
about 19,000 records), then doing a fresh import. This can be done by
uploading a "console" application into the cloud.
(http://con.appspot.com/console/help/about ) Alternatively, smaller
sets of records can be deleted via a "remove" handler, which I haven't
implemented yet.  A script will need to post the data to be removed
against the handler. Will do that though if anybody uses it. User
impact is low if old records aren't removed.

A possible alternative is to have the GAE app periodically verify the
validity of each requested record with a server we'd have to run.
(Pulling the data straight from tictocs.ac.uk doesn't work since it's
larger what you're allowed to fetch.) This approach would somewhat
defeat the idea of the cloud since we'd have to rely on keeping that
server operational, albeit at a lower degree of availability and load.

Another potential issue is the quota Google provides: you get 10GBytes
and 1.3M requests free per 24 hour period, then they start charging
you ($.12 per GByte)

I think I mentioned in my post that I included a non-GAE version of
the server that only requires mod_wsgi. For that standalone version,
keeping the data set up to date is implemented by checking the last
mod time of its localy copy - it will reread its data when it detects
a more recent jrss.txt in its current directory, so keeping its data
up to date is a simple a periodically curling
http://www.tictocs.ac.uk/text.php

 - Godmar


[CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes

2009-05-18 Thread Godmar Back
Hi,

I would like to share a few pointers to web services and widgets
Annette and I recently collaborated on. All are available under an
open source license.

"Widgets" are CSS-styled HTML elements ( or ) that provide
dynamic behavior related to the underlying web service. These are
suitable for non-JavaScript programmers familiar with HTML/CSS.

1. MAJAX 2: Includes a JSON web service (e.g.,
http://libx.lib.vt.edu/services/majax2/isbn/1412936373 or
http://libx.lib.vt.edu/services/majax2/isbn/006073132x?opacbase=http%3A%2F%2Flibcat.lafayette.edu%2Fsearch&jsoncallback=majax.processResults
) and a set of widgets to include results into web pages, see
http://libx.lib.vt.edu/services/majax2/  Supports the same set of
features as MAJAX 1 (libx.org/majax)
Source is at http://code.google.com/p/majax2/

2. ticTOC lookup: is a Google App Engine app that provides a REST
interface to JISC's ticTOC data set that maps ISSN to URLs of table of
contents RSS feeds. See http://tictoclookup.appspot.com/
Example: http://tictoclookup.appspot.com/0028-0836 and optional
refinement by title:
http://tictoclookup.appspot.com/0028-0836?title=Nature
A widget library is available; see
http://laurel.lib.vt.edu/record=b1251610~S7 for a demo (shows floating
tooltips with table of contents preview via Google Feeds and places a
link to RSS feeds)  The source is at
http://code.google.com/p/tictoclookup/ and includes a stand-alone
version of the web service which doesn't use GAE. The widget library
includes support for integration into III's record display.

3. Google Book Classes at http://libx.lib.vt.edu/services/googlebooks/
- these are widgets for Google's Book Search Dynamic Links API.
Noteworthy is support for integration into III's OPAC on the search
results page ("briefcit.html"), on the so-called bib display page
("bib_display.html") and their "WebBridge" product via field
selectors, all without JavaScript. Source is at
http://code.google.com/p/googlebooks/

4. A Link/360 JSON Proxy.  See
http://libx.lib.vt.edu/services/link360/index.html
This one takes Serials Solution's Link/360 XML Service and proxies it
as JSON. Currently does not include a widget set. Caches results 24
hours to match db update frequency.  Source is at
http://code.google.com/p/link360/  Could be combined with a widget
library, or programmed to directly, to weave Link/360 holdings data
into pages.

All JSON services accept 'jsoncallback=' for cross-domain client-side
integration.  The libx.lib.vt.edu URLs are ok to use for testing, but
for production use we recommend your own server. All modules are
written in Python as WSGI scripts, requiring setup as simple as
mod_wsgi + .htaccess.

 - Godmar


Re: [CODE4LIB] Q: AtomPub (APP) server libraries for Python?

2009-01-28 Thread Godmar Back
> 2) an XML library that doesn't choke on foreign characters. (I assume
> you're using ElementTree now?)

I meant foreign markup, as in foreign to the atom: name space.

Let me give an example. Suppose I want to serve results the way Google
does in YouTube; suppose I want to return XML similar to this one:

http://gdata.youtube.com/feeds/api/videos?vq=triumph+street+triple&racy=include&orderby=viewCount

It contains lots of foreign XML (opensearch, etc.) and it contains
lots of boilerplate (title, link, id, updated, category, etc. etc.)
that must be gotten right to be Atom-compliant. I don't want to
implement any of this.

I'd like to write the minimum amount of code that can turn information
I have in flat files into Atom documents, without having to worry
about the well-formedness or even construction of an Atom feed, or its
internal consistency.
(Perhaps similar to Pilgrim's feedparser, except that this library a)
doesn't handle all of Atom, b) doesn't support foreign XML - in fact,
doesn't even use an XML library), and is generally not intended for
the creation of feeds.

Given the adoption RFC 5023 has seen by major companies, I'm really
surprised at the lack of any supporting server libraries; perhaps not
surprisingly, the same is not true for client libraries.

 - Godmar

On Wed, Jan 28, 2009 at 9:43 AM, Ross Singer  wrote:
> Godmar,
>
> What do you need the library to do?  It seems like you'd be able to
> make an AtomPub server pretty easily with web.py (you could use the
> Jangle Core as a template, it's in Ruby, but the framework it uses,
> Sinatra, is very similar to web.py).
>
> It seems like there are two things you need here:
>
> 1) something that can RESTfully broker a bunch of incoming HTTP
> requests and return Atom Feeds and Service documents
>
> Is that right?
> -Ross.
>
> On Wed, Jan 28, 2009 at 8:13 AM, Godmar Back  wrote:
>> Hi,
>>
>> does anybody know or can recommend any server side libraries for
>> Python that produce AtomPub (APP)?
>>
>> Here are the options I found, none of which appear suitable for what
>> I'd like to do:
>>
>> amplee: 
>> http://mail.python.org/pipermail/python-announce-list/2008-February/006436.html
>> django-atompub:  http://code.google.com/p/django-atompub/
>> flatatompub http://blog.ianbicking.org/2007/09/12/flatatompub/
>>
>> Either they are immature, or require frameworks, or form frameworks,
>> and most cannot well handle foreign XML.
>>
>>  - Godmar
>>
>


[CODE4LIB] Q: AtomPub (APP) server libraries for Python?

2009-01-28 Thread Godmar Back
Hi,

does anybody know or can recommend any server side libraries for
Python that produce AtomPub (APP)?

Here are the options I found, none of which appear suitable for what
I'd like to do:

amplee: 
http://mail.python.org/pipermail/python-announce-list/2008-February/006436.html
django-atompub:  http://code.google.com/p/django-atompub/
flatatompub http://blog.ianbicking.org/2007/09/12/flatatompub/

Either they are immature, or require frameworks, or form frameworks,
and most cannot well handle foreign XML.

 - Godmar


Re: [CODE4LIB] COinS in OL?

2008-12-05 Thread Godmar Back
On Fri, Dec 5, 2008 at 1:14 PM, Ross Singer <[EMAIL PROTECTED]> wrote:
> On Fri, Dec 5, 2008 at 10:50 AM, Godmar Back <[EMAIL PROTECTED]> wrote:
>
>> BTW, I don't see why screen readers would stumble over this when the
>> child of the  is empty. Do they try to read empty text?  And if
>> a COinS is processed, we fix up the title so tooltips show nicely.
>
> Thinking about this a bit more -- does this leave the COinS in an
> unusable state if some other agent executes after LibX is done?
>

I spoke too soon. We don't touch the 'title' attribute.

But we put content in the previously empty , so there is
a potential problem with a screen reader then. (That content, though,
has its own 'title' attribute.)

 - Godmar


Re: [CODE4LIB] COinS in OL?

2008-12-05 Thread Godmar Back
On Thu, Dec 4, 2008 at 2:31 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
> Not that I know of.
>
> You can say display:none, but that'll probably hide it from LibX etc too.

No, why would it.

BTW, I don't see why screen readers would stumble over this when the
child of the  is empty. Do they try to read empty text?  And if
a COinS is processed, we fix up the title so tooltips show nicely.

 - Godmar

>
> What is needed is a CSS @media for screen readers, like one exists for
> 'print'. So you could have a seperate stylesheet for screenreaders, like you
> can have a seperate stylesheet for print. That would be the right way to do
> it.
>
> But doesn't exist.
>
> Jonathan
>
> Thomas Dowling wrote:
>>
>> On 12/04/2008 02:02 PM, Jonathan Rochkind wrote:
>>
>>>
>>> Yeah, I had recently noticed indepedently, been unhappy with the way a
>>> COinS "title" shows up in mouse-overs, and is reccommended to be used by
>>> screen readers. Oops.
>>>
>>>
>>
>> By any chance, do current screen readers honor something like '> class="Z3988" style="speak:none" title=...>'?
>>
>>
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886 rochkind (at) jhu.edu
>


Re: [CODE4LIB] COinS in OL?

2008-12-04 Thread Godmar Back
On Wed, Dec 3, 2008 at 9:12 PM, Ed Summers <[EMAIL PROTECTED]> wrote:
> On Tue, Dec 2, 2008 at 3:11 PM, Godmar Back <[EMAIL PROTECTED]> wrote:
>> COinS are still needed, in particular in situations in which multiple
>> resources are displayed on a page (like, for instance, in the search
>> results pages of most online systems or on pages such as
>> http://citeulike.org, or in a list of references such as in the
>> "references" section of many Wikipedia pages.)
>
> JSON is perfectly capable of returning a list of things.
>

True, but that's besides the point.

The metadata needs to be related to some element on the page, such as
the text in a reference. The most natural way to do this (and COinS
allows this) is to place the COinS next to (for instance) the
reference to which it refers.

 - Godmar


Re: [CODE4LIB] COinS in OL?

2008-12-02 Thread Godmar Back
Having a per-page link to get an alternate representation of a
resource is certainly helpful for some applications, and please do
support it, but don't consider the problem solved.

The primary weakness of this approach is that it works only if a page
is dedicated to a single resource.

COinS are still needed, in particular in situations in which multiple
resources are displayed on a page (like, for instance, in the search
results pages of most online systems or on pages such as
http://citeulike.org, or in a list of references such as in the
"references" section of many Wikipedia pages.)

 - Godmar

On Mon, Dec 1, 2008 at 11:21 PM, Ed Summers <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 1, 2008 at 11:05 PM, Karen Coyle <[EMAIL PROTECTED]> wrote:
>> I asked about COinS because it's something I have vague knowledge of. (And I
>> assume it isn't too difficult to implement.) However, if there are other
>> services that would make a bigger difference, I invite you (all) to speak
>> up. It makes little sense to have this large quantity of bib data if it
>> isn't widely and easily usable.
>
> Sorry to be overwhelming. I guess the main thing I wanted to
> communicate is that you could simply add:
>
>   href="http://openlibrary.org/api/get?key=/b/{open-library-id}"; />
>
> to the  element in OpenLibrary HTML pages for books, and that
> would go a long way to making machine readable data for books
> discoverable by web clients.
>
> //Ed
>


Re: [CODE4LIB] COinS in OL?

2008-12-01 Thread Godmar Back
Correct.

Right now, COinS handling in LibX 1.0 is primitive and always links to
the OpenURL resolver. However, LibX 2.0 will allow customized handling
so that, for instance, ISBN COinS can be treated differently than
dissertation COinS or article CoinS.  The framework for this is
already partially in place, so ambitious JavaScript programmers can
implement such custom handling for their extension; with LibX 2.0,
every LibX maintainer will be able to choose their own preferred way
of making use of COinS.

When you place COinS, don't assume it'll only be used by tools that
simply read the info from it - place it in a place in your DOM where
there's some white space, or where placing a small link or icon would
not destroy the look and feel of your interface.

 - Godmar

On Mon, Dec 1, 2008 at 11:45 AM, Stephens, Owen
<[EMAIL PROTECTED]> wrote:
> LibX uses COinS as well I think - so generally be useful in taking
> people from the global context (Open Library) to the local (via LibX)
>
> Owen
>
> Owen Stephens
> Assistant Director: eStrategy and Information Resources
> Central Library
> Imperial College London
> South Kensington Campus
> London
> SW7 2AZ
>
> t: +44 (0)20 7594 8829
> e: [EMAIL PROTECTED]
>
>> -Original Message-
>> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
> Of
>> Karen Coyle
>> Sent: 01 December 2008 16:08
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] COinS in OL?
>>
>> I have a question to ask for the Open Library folks and I couldn't
>> quite
>> figure out where to ask it. This seems like a good place.
>>
>> Would it be useful to embed COinS in the book pages of the Open
>> Library?
>> Does anyone think they might make use of them?
>>
>> Thanks,
>> kc
>>
>> --
>> ---
>> Karen Coyle / Digital Library Consultant
>> [EMAIL PROTECTED] http://www.kcoyle.net
>> ph.: 510-540-7596   skype: kcoylenet
>> fx.: 510-848-3913
>> mo.: 510-435-8234
>> 
>


[CODE4LIB] GAE sample (was: a brief summary of the Google App Engine)

2008-07-16 Thread Godmar Back
FWIW, the sample application I built to familiarize myself with GAE is
a simple REST cache. It's written in < 250 lines overall, including
Python + YAML.

For instance, a resource such as:
http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=xml&id=3966282
can be accessed via GAE using:
http://libxcache.appspot.com/get?url=http%3a%2f%2fwww.ncbi.nlm.nih.gov%2fentrez%2feutils%2fesummary.fcgi%3fdb%3dpubmed%26retmode%3dxml%26id%3d3966282

Or, you can access:
http://demo.jangle.org/openbiblio/resources/5974
as
http://libxcache.appspot.com/get?url=http%3a%2f%2fdemo.jangle.org%2fopenbiblio%2fresources%2f5974
(To take some load off that Jangle demo, Ross, in case it's slashdotted.)

 - Godmar


Re: [CODE4LIB] a brief summary of the Google App Engine

2008-07-16 Thread Godmar Back
On Wed, Jul 16, 2008 at 6:29 AM, Keith Jenkins <[EMAIL PROTECTED]> wrote:
> But for anything larger, you'd probably want to figure
> out a way to manually build an index within the Google datastore, or
> else keep the indexing outside GAE, and just use GAE for fetching
> specified records.  Any ideas on how that might work?
>

Presumably, it would work the same way Google's primary web index does
- which after all uses the same infrastructure; you chop each fields
into words, create an index of these words from their ids that maps
them to a list of records (in Google: pages) that contains them. See
Brin/Page's only paper for details:
http://infolab.stanford.edu/~backrub/google.html

 - Godmar


Re: [CODE4LIB] a brief summary of the Google App Engine

2008-07-15 Thread Godmar Back
On Tue, Jul 15, 2008 at 2:16 PM, Fernando Gomez <[EMAIL PROTECTED]> wrote:
>
> Any thoughts about a convenient way of storing and (more importantly)
> indexing & retrieving MARC records using GAE's Bigtable?
>

GAE uses Django's object-relational model. You can define a Python
class, inherit from db.model, declare properties of your model; then
instances can be created, stored, retrieved and updated.
GAE performs automatic indexing on some fields, and you can tell it to
index on others, or using certain combinations.

Aside from the limitations imposed by the index model, the problem
then is fundamentally similar to how you index MARC data for use in
any discovery system.  Presumably, you could learn from the
experiences of the many projects that have done that - some in Python,
such as http://code.google.com/p/fac-back-opac/  (though they use
Django, they don't appear to be using its object-relational db model
for MARC records; I say this from a 2 min examination of parts of
their code; I may be wrong. PyMarc itself doesn't support it.)

 - Godmar


[CODE4LIB] a brief summary of the Google App Engine

2008-07-13 Thread Godmar Back
Hi,

since I brought up the issue of the Google App Engine (GAE) (or
similar services, such as Amazon's EC2 "Elastic Compute Cloud"), I
thought I give a brief overview of what it can and cannot do, such
that we may judge its potential use for library services.

GAE is a cloud infrastructure into which developers can upload
applications. These applications are replicated among Google's network
of data centers and they have access to its computational resources.
Each application has access to a certain amount of resources at no
fee; Google recently announced the pricing for applications whose
resource use exceeds the "no fee" threshold [1]. The no fee threshold
is rather substantial: 500MB of persistent storage, and, according to
Google, enough bandwidth and cycles to serve about "5 million page
views" per month.

Google Apps must be written in Python. They run in a sandboxed
environment. This environment limits what applications can do and how
they communicate with the outside world.  Overall, the sandbox is very
flexible - in particular, application developers have the option of
uploading additional Python libraries of their choice with their
application. The restrictions lie primarily in security and resource
management. For instance, you cannot use arbitrary socket connections
(all outside world communication must be through GAE's "fetch" service
which supports http/https only), you cannot fork processes or threads
(which would use up CPU cycles), and you cannot write to the
filesystem (instead, you must store all of your persistent data in
Google's scalable datastorage, which is also known as BigTable.)

All resource usage (CPU, Bandwidth, Persistent Storage - though not
memory) is accounted for and you can see your use in the application's
"dashboard" control panel. Resources are replenished on the fly where
possible, as in the case of CPU and Bandwidth. Developers are
currently restricted to 3 applications per account. Making
applications in multiple accounts work in tandem to work around quota
limitations is against Google's terms of use.

Applications are described by a configuration file that maps URI paths
to scripts in a manner similar to how you would use Apache
mod_rewrite.  URIs can also be mapped to explicitly named static
resources such as images. Static resources are uploaded along with
your application and, like the application, are replicated in Google's
server network.

The programming environment is CGI 1.1.  Google suggests, but doesn't
require, the use of supporting libraries for this model, such as WSGI.
 This use of high-level libraries allows applications to be written in
a very compact, high-level style, the way one is used to from Python.
In addition to the WSGI framework, this allows the use of several
template libraries, such as Django.  Since the model is CGI 1.1, there
are no or very little restrictions on what can be returned - you can
return, for instance, XML or JSON and you have full control over the
Content-Type: returned.

The execution model is request-based.  If a client request arrives,
GAE will start a new instance (or reuse an existing instance if
possible), then invoke the main() method. At this point, you have a
set limit to process this request (though not explicitly stated in
Google's doc, the limit appears to be currently 9 seconds) and return
a result to the client. Note that this per-request limit is a maximum;
you should usually be much quicker in your response. Also note that
any CPU cycles you use during those 9 seconds (but not time you spent
wait fetching results from other application tiers) count against your
overall CPU budget.

The key service the GAE runtime libraries provide is the Google
datastore, aka BigTable [2].
You can think of this service as a highly efficient, persistent store
for structured data. You may think of it as a simplified database that
allows the creation, retrieval, updating, and deletion (CRUD) of
entries using keys and, optionally, indices. It provides limited
support transactions as well. Though it is less powerful than
conventional relational databases - which aren't nearly as scalable -
it can be accessed using GQL, a query language that's similar in
spirit to SQL.  Notably, GQL (or BigTable) does not support JOINs,
which means that you will have to adjust your traditional approach to
database normalization.

The Python binding for the structured data is intuitive and seamless.
You simply declare a Python class for the properties of objects you
wish to store, along with the types of the properties you wish
included, and you can subsequently use a put() or delete() method to
write and delete. Queries will return instances of the objects you
placed in a given table.  Tables are named using the Python classes.

Google provides a number of additional runtime libraries, such as for
simple Image processing a la Google Picasa, for the sending of email
(subject to resource limits), and for user authentication, solely
using Google

Re: [CODE4LIB] anyone know about Inera?

2008-07-12 Thread Godmar Back
Min, Eric, and others working in this domain -

have you considered designing your software as a scalable web service
from the get-go, using such frameworks as Google App Engine? You may
be able to use Montepython for the CRF computations
(http://montepython.sourceforge.net/)

I know Min offers a WSDL wrapper around their software, but that's
simply a gateway to one single-machine installation, and it's not
intended as a production service at that.

 - Godmar

On Sat, Jul 12, 2008 at 3:20 AM, Min-Yen Kan <[EMAIL PROTECTED]> wrote:
> Hi Steve, all:
>
> I'm the key developer of ParsCit.  I'm glad to hear your feedback
> about what doesn't work with ParsCit.  Erik is correct in saying that
> we have only trained the system for what data we have correct answers
> for, namely computer science.  As such it doesn't perform well with
> other data (especially health sciences citations, which we have also
> done some pilot tests on.  I note that there are other citation
> parsers out there, include Erik's own HMM parser (I think Erik
> mentioned it as well, available from his website here:
> http://gales.cdlib.org/~egh/hmm-citation-extractor/)
>
> Anyways, I've tried your citation too, and got the same results from
> the demo -- it doesn't handle the authors correctly in this case.  I
> would very much love to have as many example cases of incorrectly
> parsed citations as the community is willing to share with us so we
> can improve ParsCit (it's open source so all can benefit from
> improvements to ParsCit).
>
> We are trying to be as proactive as possible about maintaining and
> improving ParsCit.  I know of at least two groups that have said they
> are willing to contribute more citations (with correct markings) to us
> so that we can re-train ParsCit, and there is interest in porting it
> to other languages (i.e. German right now).  We would love to get
> samples of your data too, where the program does go wrong, to help
> improve our system.  And to get feedback of other fields that need to
> be parsed in as well: ISSN, ISBNs, volume, and issues.
>
> We are also looking to make the output of the ParsCit system
> compatible with EndNote, BibTeX.  We actually have an internal project
> to try to hook up ParsCit to find references on arbitrary web pages
> (to form something like Zotero that's not site specific and
> non-template based).  If and when this project comes to fruition we'll
> be announcing it to the list.
>
> If anyone has used ParsCit and has feedback on what can be further
> improved we'd love to hear from you.  You are our target audience!
>
> Cheers,
>
> Min
>
> --
> Min-Yen KAN (Dr) :: Assistant Professor :: National University of
> Singapore :: School of Computing, AS6 05-12, Law Link, Singapore
> 117590 :: 65-6516 1885(DID) :: 65-6779 4580 (Fax) ::
> [EMAIL PROTECTED] (E) :: www.comp.nus.edu.sg/~kanmy (W)
>
> PS: Hi Erik, still planning on studying your HMM package for improving
> ParsCit ... It's on my agenda.
> Thanks again.
>
> On Sat, Jul 12, 2008 at 5:36 AM, Steve Oberg <[EMAIL PROTECTED]> wrote:
>> Yeah, I am beginning to wonder, based on these really helpful replies, if I
>> need to scale back to what is "doable" and "reasonable." And reassess
>> ParsCit.
>>
>> Thanks to all for this additional information.
>>
>> Steve
>>
>> On Fri, Jul 11, 2008 at 4:18 PM, Nate Vack <[EMAIL PROTECTED]> wrote:
>>
>>> On Fri, Jul 11, 2008 at 3:57 PM, Steve Oberg <[EMAIL PROTECTED]> wrote:
>>>
>>> > I fully realize how much of a risk that is in terms of reliability and
>>> > maintenance.  But right now I just want a way to do this in bulk with a
>>> high
>>> > level of accuracy.
>>>
>>> How bad is it, really, if you get some (5%?) bad requests into your
>>> document delivery system? Customers submit poor quality requests by
>>> hand with some frequency, last I checked...
>>>
>>> Especially if you can hack your system to deliver the original
>>> citation all the way into your doc delivery system, you may be able to
>>> make the case that 'this is a good service to offer; let's just deal
>>> with the bad parses manually.'
>>>
>>> Trying to solve this via pure technology is gonna get into a world of
>>> diminishing returns. A surprising number of citations in references
>>> sections are wrong. Some correct citations are really hard to parse,
>>> even by humans who look at a lot of citations.
>>>
>>> ParsCit has, in my limited testing, worked as well as anything I've
>>> seen (commercial or OSS), and much better than most.
>>>
>>> My $0.02,
>>> -Nate
>>>
>>
>


Re: [CODE4LIB] use of OpenSearch response elements in libraries?

2008-06-24 Thread Godmar Back
I too find this decision intriguing, and I'm wondering about its wider
implications on the use of RSS/Atom as a container format inside and
outside the context of OpenSearch as it relates to library systems.

I note that an OpenSearch description does not allow you to specify
type of the items contained within a RSS or Atom feed being
advertised. As such, it's impossible to advertise multiple output
formats within a single OpenSearchDescription (specifically, you can
only have 1  element with 'type="application/rss+xml"').
Therefore, clients consuming OpenSearch must be prepared to interpret
the record types correctly, but cannot learn from the server a priori
what those are.

My guess would be that OCLC is shooting for OpenSearch consumers that
expect RSS/Atom feeds and that have some generic knowledge on how to
process items that contain, for instance, HTML; but at the same time
are unprepared to handle MARCXML or other metadata formats. Examples
may include Google Reader or the A9 metasearch engine.

The alternative, SRU, contains no expectation that items by processed
by clients that are unaware of library metadata formats. In addition,
its 'explain' verb allows clients to learn which metadata formats they
can request.

This may be reviving a discussion that an Internet search shows was
very active in the community about 4 years ago, although 4 years
later, I was unable to find out the outcome of this discussion, so it
may be good to capture the current thinking.

What client applications currently consume OpenSearch results vs. what
client applications consume SRU results?

I understand that a number of ILS vendors besides OCLC have already or
are in the process of providing web services interfaces to their
catalog; do they choose OpenSearch and/or SRU, or a heterogeneous mix
in the way OCLC does. If they choose OpenSearch, do they use RSS or
ATOM feeds to carry metadata records?

 - Godmar

On Tue, Jun 24, 2008 at 1:23 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
> In general, is there a reason to have different metadata formats from SRU vs
> OpenSearch? Is there a way to just have the same metadata formats available
> for each? Or are the demands of each too different to just use the same
> underlying infrastructure, such that it really does take more work to
> include a metadata format as an OpenSearch option even if it's already been
> included as an SRU option?
>
> Personally, I'd like these alternate access methods to still have the same
> metadata format options, if possible. And other options. Everything should
> be as consistent as possible to avoid confusion.
>
> Jonathan
>
> Washburn,Bruce wrote:
>>
>> Godmar,
>>
>> I'm one of the developers working on the WorldCat API.  My take is that
>> the API is evolving and adapting as we learn more about how it's
>> expected to be used.  We haven't precluded the addition of more record
>> metadata to OpenSearch responses; we opted not to implement it until we
>> had more evidence of need.
>> As you've noted, WorldCat API OpenSearch responses are currently limited
>> to title and author information plus a formatted bibliographic citation,
>> while more complete record metadata is available in DC or MARC XML in
>> SRU responses. Until now we had not seen a strong push from the API
>> early implementers for more record metadata in OpenSearch responses,
>> based on direct feedback and actual use.  I can see how it could be a
>> useful addition, though, so we'll look into it.
>>
>> Bruce
>>
>>
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886 rochkind (at) jhu.edu
>


Re: [CODE4LIB] use of OpenSearch response elements in libraries?

2008-06-24 Thread Godmar Back
[ this discussion may be a bit too detailed for the general readership of
code4lib; readers not interested in the upcoming WC search API may wish to
skip... ]

Roy,

Atom/RSS are simply the container formats used to return multiple items of
some kind --- I'm curious about what those items contain.

In the example shown in
http://worldcat.org/devnet/index.php/SearchAPIDetails#Using_OpenSearch it
appears that the items are only preformatted citations, rather than, for
instance, MARCXML or DC representation of records.  (The SRU interface, on
the other hand, appears to return MARCXML/DC.)  Is this impression false and
does the OpenSearch API in fact return record metadata beyond preformatted
citations? (I note that your search syntax for OpenURL does not allow the
choice of a recordSchema.)

If so, what's the rationale for not supporting the retrieval of record
metadata via OpenSearch?

 - Godmar

On Tue, Jun 24, 2008 at 10:17 AM, Roy Tennant <[EMAIL PROTECTED]> wrote:

> To be specific, currently supported record formats for an OpenSearch query
> of the WorldCat API are Atom and RSS as well as the preformatted citation.
> Roy
>
>
> On 6/23/08 6/23/08 • 10:18 PM, "Godmar Back" <[EMAIL PROTECTED]> wrote:
>
> > Thanks --- let me do some "query refinement" then -- does anybody know of
> > examples where record metadata (e.g., MARCXML or DC) is returned as an
> > OpenSearch response?  [ If I understand the proposed Worldcat API
> correctly,
> > OpenSearch is used only for pre-formatted citations in HTML. ]
> >
> >  - Godmar
> >
> > On Tue, Jun 24, 2008 at 12:54 AM, Roy Tennant <[EMAIL PROTECTED]> wrote:
> >
> >> I believe WorldCat qualifies, although the API is not yet ready for
> general
> >> release (but soon):
> >>
> >> <http://worldcat.org/devnet/index.php/SearchAPIDetails>
> >>
> >> Roy
> >>
> >>
> >> On 6/23/08 6/23/08 € 8:55 PM, "Godmar Back" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi,
> >>>
> >>> are there any examples of functioning OpenSearch interfaces to library
> >>> catalogs or library information systems?
> >>>
> >>> I'm specifically interested in those that not only advertise a
> text/html
> >>> interface to their catalog, but who include OpenSearch response
> elements.
> >>> One example I've found is Evergreen; though it's not clear to what
> extent
> >>> this interface is used or implemented. For instance, their demo
> >>> installation's OpenSearch description advertises an ATOM feed, but
> what's
> >>> returned doesn't validate. (*)
> >>>
> >>> Are there other examples deployed (and does anybody know applications
> >> that
> >>> consume OpenSearch feeds?)
> >>>
> >>>  - Godmar
> >>>
> >>> (*) See, for instance:
> >>>
> >>
>
> http://demo.gapines.org/opac/extras/opensearch/1.1/PINES/atom-full/keyword/?s
> >>
> e
> >>> archTerms=music&startPage=&startIndex=&count=&searchLang
> >>> which is not a valid ATOM feed:
> >>>
> >>
>
> http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fdemo.gapines.org%2Fop
> >>
> a
> >>>
> >>
>
> c%2Fextras%2Fopensearch%2F1.1%2FPINES%2Fatom-full%2Fkeyword%2F%3FsearchTerms%>>
> 3
> >>> Dmusic%26startPage%3D%26startIndex%3D%26count%3D%26searchLang
> >>>
> >>
> >> --
> >>
> >
>
> --
>


Re: [CODE4LIB] use of OpenSearch response elements in libraries?

2008-06-23 Thread Godmar Back
Thanks --- let me do some "query refinement" then -- does anybody know of
examples where record metadata (e.g., MARCXML or DC) is returned as an
OpenSearch response?  [ If I understand the proposed Worldcat API correctly,
OpenSearch is used only for pre-formatted citations in HTML. ]

 - Godmar

On Tue, Jun 24, 2008 at 12:54 AM, Roy Tennant <[EMAIL PROTECTED]> wrote:

> I believe WorldCat qualifies, although the API is not yet ready for general
> release (but soon):
>
> <http://worldcat.org/devnet/index.php/SearchAPIDetails>
>
> Roy
>
>
> On 6/23/08 6/23/08 € 8:55 PM, "Godmar Back" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > are there any examples of functioning OpenSearch interfaces to library
> > catalogs or library information systems?
> >
> > I'm specifically interested in those that not only advertise a text/html
> > interface to their catalog, but who include OpenSearch response elements.
> > One example I've found is Evergreen; though it's not clear to what extent
> > this interface is used or implemented. For instance, their demo
> > installation's OpenSearch description advertises an ATOM feed, but what's
> > returned doesn't validate. (*)
> >
> > Are there other examples deployed (and does anybody know applications
> that
> > consume OpenSearch feeds?)
> >
> >  - Godmar
> >
> > (*) See, for instance:
> >
> http://demo.gapines.org/opac/extras/opensearch/1.1/PINES/atom-full/keyword/?se
> > archTerms=music&startPage=&startIndex=&count=&searchLang
> > which is not a valid ATOM feed:
> >
> http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fdemo.gapines.org%2Fopa
> >
> c%2Fextras%2Fopensearch%2F1.1%2FPINES%2Fatom-full%2Fkeyword%2F%3FsearchTerms%3
> > Dmusic%26startPage%3D%26startIndex%3D%26count%3D%26searchLang
> >
>
> --
>


[CODE4LIB] use of OpenSearch response elements in libraries?

2008-06-23 Thread Godmar Back
Hi,

are there any examples of functioning OpenSearch interfaces to library
catalogs or library information systems?

I'm specifically interested in those that not only advertise a text/html
interface to their catalog, but who include OpenSearch response elements.
One example I've found is Evergreen; though it's not clear to what extent
this interface is used or implemented. For instance, their demo
installation's OpenSearch description advertises an ATOM feed, but what's
returned doesn't validate. (*)

Are there other examples deployed (and does anybody know applications that
consume OpenSearch feeds?)

 - Godmar

(*) See, for instance:
http://demo.gapines.org/opac/extras/opensearch/1.1/PINES/atom-full/keyword/?searchTerms=music&startPage=&startIndex=&count=&searchLang
which is not a valid ATOM feed:
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fdemo.gapines.org%2Fopac%2Fextras%2Fopensearch%2F1.1%2FPINES%2Fatom-full%2Fkeyword%2F%3FsearchTerms%3Dmusic%26startPage%3D%26startIndex%3D%26count%3D%26searchLang


Re: [CODE4LIB] Open Source Repositories

2008-05-16 Thread Godmar Back
Generally, you won't find a credible site that would allow you to
upload unvetted binaries of adapted versions of low-volume software.
The obvious risks are just too high.

My recommendation would be a personal webpage, hosted on a site that's
associated with a real-world institution, and a real-world contact.

 - Godmar

On Fri, May 16, 2008 at 10:24 AM, Carol Bean <[EMAIL PROTECTED]> wrote:
> I probably should clarify that the friend is looking for a place to share
> what she's already fixed and compiled to run on a low resource machine (both
> in Windows and Linux)
>
> Thanks,
> Carol
>
> On Fri, May 16, 2008 at 9:52 AM, MJ Ray <[EMAIL PROTECTED]> wrote:
>
>> Carol Bean <[EMAIL PROTECTED]> wrote:
>> > Done anyone know of open source repositories that have precompiled
>> > software?  (Especially low resource software)
>>
>> As well as their own, most of the free software operating systems have
>> third-party repositories, such as those listed at
>> http://www.apt-get.org/ for debian.
>>
>> Make sure you trust the third party provider, though!
>>
>> Regards,
>> --
>> MJ Ray (slef)
>> Webmaster for hire, statistician and online shop builder for a small
>> worker cooperative http://www.ttllp.co.uk/ http://mjr.towers.org.uk/
>> (Notice http://mjr.towers.org.uk/email.html) tel:+44-844-4437-237
>>
>
>
>
> --
> Carol Bean
> [EMAIL PROTECTED]
>


[CODE4LIB] Q.: "deep-linking" syntax for Encore?

2008-05-14 Thread Godmar Back
Hi,

may I tap the collective wisdom of this list?

Is anybody using III's Encore system and happens to know if there is a
deep-linking syntax, either documented or inferred, for it?

Thanks.

 - Godmar


Re: [CODE4LIB] Latest OpenLibrary.org release

2008-05-08 Thread Godmar Back
On Thu, May 8, 2008 at 11:25 AM, Dr R. Sanderson
<[EMAIL PROTECTED]> wrote:
>
>  Like what?  The current API seems to be concerned with search.  Search
>  is what SRU does well.  If it was concerned with harvest, I (and I'm
>  sure many others) would have instead suggested OAI-PMH.
>

No, the API presented does not support search. I asked Alexis about
it, they said the search API will be released soon. If you have ideas,
why don't you share them on the ol-discuss mailing list with OL's
developers?

 - Godmar


Re: [CODE4LIB] google books and OCLC numbers

2008-05-08 Thread Godmar Back
Mark,

I'll answer this one on list, but let's take discussion that is
specifically related to GBS classes off-list since you're asking
questions about this particular software --- I had sent the first
email to Code4Lib because I felt that our method of integrating the
Google Book viewability API into III Millennium in a clean way was
worth sharing with the community.

On Thu, May 8, 2008 at 10:07 AM, Custer, Mark <[EMAIL PROTECTED]> wrote:
> Slide 4 in that PowerPoint mentions something about a "small set of
>  Google Book Search information", but is also says that the items are
>  indexed by ISBN, OCLC#, and LCCN.  And yet, during the admittedly brief
>  time that I tried out this really nice demo, I was unable to find any
>  links to books that were available in "full view", which made me wonder
>  if any of the search results were searching GBS with their respective
>  OCLC #s (and not just ISBNs, if available).

GBS searches by whatever you tell it: ISBN, OCLC, *OR* LCCN. Not all of them.

>
>  For example, if I use the demo site that's provided and search for "mark
>  twain" and limit my results to publication dates of, say, 1860-1910, I
>  don't receive a single GBS link.  So I checked to see if "Eve's Diary"
>  was in GBS and, of course, it was... and then I made sure that the copy
>  I found in the demo had the same OCLC# as the one in GBS; and it was.
>  So, is this a feature that will be added later, or is it just that the
>  entire set of bib records available at the demo site are not included in
>  the GBS aspect of the demo?

By "demo site" provided, do you mean addison.vt.edu:2082?
Remember that in this demo, the link is only displayed if Google has a
partial view, and *not* if Google has full text or no view. It's my
understanding that Twain's books are past copyright, so Google has
fully scanned them and they are available as full text.

If you take that into account, Eve's Diary (OCLC# 01052228) works
fine. I added it at the bottom of http://libx.org/gbs/tests.html
To search for this book by OCLC, you'd use this span:

Eve's Diary

which links to the full text version. Note that --- interestingly ---
Google does not appear to have a thumbnail for this book's cover.

>
>  Secondly, I have another question which I hope that someone can clear up
>  for me.  Again, I'll use this copy of "Eve's Diary" as an example, which
>  has an  OCLC number of 01052228.  Now, if you search worldcat.org (using
>  the advance search, basic search, of even adding things like "oclc:"
>  before the number), the only way that I can access this item is to
>  search for "1052228" (removing the leading zero).  And this is exactly
>  how the OCLC number displays in the metadata record, directly below the
>  field that states that there are 18 editions of this work.
>
>  All of that said, I can still access the book with either of these URLs:
>
>  http://worldcat.org/wcpa/oclc/1052228
>  http://worldcat.org/wcpa/oclc/01052228
>
>  Now, I could've sworn that GBS followed a similar route, and so, I
>  previously searched it for OCLC numbers by removing any leading zeroes.
>  As of at least today, though, the only way for me to access this book
>  via GBS is to use the OCLC number as it appears in the MARC record...
>  that is, by searching for "oclc01052228".
>
>  Has anyone else noticed this change in GBS (though it's quite possible
>  that I'm simply mistaken)?  And could anyone inform me about the
>  technical details of any of these issues?  I mean, I get that worldcat
>  has to also deal with ISSNs, but is there a way to use the search box to
>  explicitly declare what type of number the query is... and why would the
>  value need to have the any leading 0's removed in the metadata display
>  (especially since the URL method can access either)?
>

That's a question about the search interface accessed at
books.google.com, not about the book viewability API. Those are two
different services. The viewability API advertises that it supports
OCLC: and LCCN: prefixes to search for OCLC and LCCN, respectively, in
addition to ISBNs, and that works in your example, for instance,
visit:

http://books.google.com/books?jscmd=viewapi&bibkeys=OCLC:01052228&callback=X
or
http://books.google.com/books?jscmd=viewapi&bibkeys=OCLC:1052228&callback=X

The books.google.com search interface doesn't advertise the ability to
search by OCLC number --- the only reason you are successful with
searching for OCLC01052228 is because this string happens to occur
somewhere in this book's metadata description, and Google has the full
content of the metadata descriptions indexed like it indexed webpages.

Take also a look at the advanced search interface at:
http://books.google.com/advanced_book_search
You'll find no support for OCLC or LCCN. It does show, however, than
isbn: can be used to search for ISBNs, in the style prefixes can be
used in other search interfaces.

 - Godmar


Re: [CODE4LIB] coverage of google book viewability API

2008-05-06 Thread Godmar Back
On Tue, May 6, 2008 at 11:02 PM, Michelle Watson
<[EMAIL PROTECTED]> wrote:
>
>  Is there something in the code that prevents the link from being
>  offered unless it goes to at least a partial preview (which I take to
>  mean scanned pages), or have I just been lucky in my searching?  I
>  can't comment on whether or not the 'no preview'  is useful because
>  every book I see has some scanned content.
>

Yes, in Annette's example, the link is only offered if Google has
preview pages in addition to the book information. See the docs on
libx.org/gbs for further detail (look for gbs-if-partial )

I had the same subjective impression in that I was surprised by how
many books have previews - for instance, if I search for "genomics" on
addison.vt.edu:2082, 24 of the first 50 hits returned have partial
previews. Incidentally, 2 out of the 24 lead to the wrong book.
This is why I sampled the LoC's ISBN set.

It's likely that there's observer bias (such as trying "genomics"),
and it's also possible that Google is more likely to have previews for
books libraries tend to hold, such as popular or recent books. (I note
that most of the 24 hits for genomics that have previews are less than
4 years old.)
Conversely, for those recent years, precision may be lower, with more
books misindexed.

 - Godmar


Re: [CODE4LIB] coverage of google book viewability API

2008-05-06 Thread Godmar Back
On Tue, May 6, 2008 at 8:24 PM, Tim Spalding <[EMAIL PROTECTED]> wrote:
> 0.2% full text? Yowch!
>
>  Do academic libraries with full-text versions of the book on their
>  shelves really want to point people to no-preview pages on Google.

In the example I show on the slides to which I pointed, the link to
Google is displayed only for books for which partial previews exist.
For this, the use case is clear - users can browse at least some pages
of the book before deciding to head to the library to check the book
out.

>
>  Doing LCCNs and OCLC numbers for older books is a must.

Yes, I'll repeat the experiments for LCCNs and OCLC numbers if I have
time. In the LoC dataset, only about 3.2 of about 7.5 million records
had ISBNs.

 - Godmar


Re: [CODE4LIB] coverage of google book viewability API

2008-05-06 Thread Godmar Back
ps: the distribution of the full text availability for the sample
considered was as follows:

No preview: 797 (93.5%)
Partial preview: 53 (6.2%)
Full text: 2 (0.2%)

 - Godmar

On Tue, May 6, 2008 at 6:09 PM, Godmar Back <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  to examine the usability of Google's book viewability API when lookup
>  is done via ISBN, we did some experiments, the results of which I'd
>  like to share. [1]
>
>  For 1000 randomly drawn ISBN from 3,192,809 ISBN extracted from a
>  snapshot of LoC's records [2], Google Books returned results for 852
>  ISBN.  We then downloaded the page that was referred to in the
>  "info_url" parameter of the response (which is the "About" page Google
>  provides) for each result.
>
>  To examine whether Google retrieved the correct book, we checked if
>  the Info page contained the ISBN for which we'd searched. 815 out of
>  852 contained the same ISBN. 37 results referred to a different ISBN
>  than the one searched for.  We examined the 37 results manually: 33
>  referred to a different edition of the book whose ISBN was used to
>  search, as judged by comparing author/title information with OCLC's
>  xISBN service. (We compared the author/title returned by xISBN with
>  the author/title listed on Google's book information page.)  4 records
>  appeared to be misindexed.
>
>  I found the results (85.2% recall and >99% precision, if you allow for
>  the ISBN substitution; with a 3.1% margin of error) surprisingly high.
>
>   - Godmar
>
>  [1] http://top.cs.vt.edu/~gback/gbs-accuracy-study/
>  [2] http://www.archive.org/details/marc_records_scriblio_net
>


Re: [CODE4LIB] google books for III millennium

2008-05-06 Thread Godmar Back
The solution is entirely client-side; as it has to be for this
particular kind of legacy system. (In some so-called "turn-key"
versions, this particular company does not even provide access to the
server's file system, let alone the option of running any services.)

We had already discussed how it works (check the threads from March);
this particular pointer was simply a pointer about how to integrate it
into this particular system (since there were doubts back then about
how hard or easy such integration is.)

 - Godmar

On Tue, May 6, 2008 at 5:53 PM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
> This is interesting. These slides don't give me quite enough info to
>  figure out what's going on (I hate reading slides by themselves!), but
>  I'm curious about this statement: "Without JavaScript coding
>  (even though Google's API requires JavaScript coding as it is) ". Are
>  you making calls server-side, or are you still making them client-side?
>
>  As you may recall, one issue I keep beating upon is the desire to call
>  Google's API server-side. While it's technically possible to call it
>  server-side, Google doesn't want you to. I wonder if this is what
>  they're doing there? The problems with that are:
>
>  1) It may violate Googles terms of service
>  2) It may run up against Google traffic-limiting defenses
>  3) [Google's given reason]: It doesn't allow Google to tailor the
>  results to the end-users location (determined by IP).
>
>  Including an x-forwarded-for header _may_ get around #2 or #3. Including
>  an x-forwarded-for header should probably be considered a best practice
>  when doing this sort of thing server-side in general, but I'm still
>  nervous about doing this, and wish that Google would just plain say they
>  allow server-side calls.
>
>
>
>
>
>  Godmar Back wrote:
>
> > Hi,
> >
> > here's a pointer to follow up on the earlier discussion on how to
> > integrate Google books viewability API into closed legacy systems that
> > allow only limited control regarding what is being output, such as
> > III's Millennium system. Compared to other solutions, no JavaScript
> > programming is required, and the integration into the vendor-provided
> > templates (such as briefcit.html etc.) is reasonably clean, provides
> > targeted placement, and allows for multiple uses per page.
> >
> > Slides (excerpted from Annette Bailey's presentation at IUG 2008):
> > http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt
> > A demo is currently available here: http://addison.vt.edu:2082/
> >
> >  - Godmar
> >
> >
> >
>
>  --
>  Jonathan Rochkind
>  Digital Services Software Engineer
>  The Sheridan Libraries
>  Johns Hopkins University
>  410.516.8886
>  rochkind (at) jhu.edu
>


Re: [CODE4LIB] google books for III millennium

2008-05-06 Thread Godmar Back
Kent,

the link you provide is for the Google API --- however, I was
referring to the Google Book Viewability API. They're unrelated, to my
knowledge.

My experience with the Google Book Viewability API is that it can be
invoked server-side (Google's terms notwithstanding), but requires a
user-agent that mimics an existing browser. A user agent such as the
one provided by Sun's JDK (I think it's "jdk-1.6" or some such) will
be rejected; a referrer URL, on the other hand, does not appear to be
required).

 - Godmar

On Tue, May 6, 2008 at 6:32 PM, Kent Fitch <[EMAIL PROTECTED]> wrote:
> Hi Jonathan,
>
>  The Google API can now be invoked guilt-free from server-side, see:
>
>  http://code.google.com/apis/ajaxsearch/documentation/#fonje
>
>  "For Flash developers, and those developers that have a need to access
>  the AJAX Search API from other Non-Javascript environments, the API
>  exposes a simple RESTful interface. In all cases, the method supported
>  is GET and the response format is a JSON encoded result set with
>  embedded status codes. Applications that use this interface must abide
>  by all existing terms of use. An area to pay special attention to
>  relates to correctly identifying yourself in your requests.
>  Applications MUST always include a valid and accurate http referer
>  header in their requests. In addition, we ask, but do not require,
>  that each request contains a valid API Key. By providing a key, your
>  application provides us with a secondary identification mechanism that
>  is useful should we need to contact you in order to correct any
>  problems."
>
>  Well, guilt-free if you agree to the terms, which include:
>
>  "The API may be used only for services that are accessible to your end
>  users without charge."
>
>  "You agree that you will not, and you will not permit your users or
>  other third parties to: (a) modify or replace the text, images, or
>  other content of the Google Search Results, including by (i) changing
>  the order in which the Google Search Results appear, (ii) intermixing
>  Search Results from sources other than Google, or (iii) intermixing
>  other content such that it appears to be part of the Google Search
>  Results; or (b) modify, replace or otherwise disable the functioning
>  of links to Google or third party websites provided in the Google
>  Search Results."
>
>  Regards,
>
>  Kent Fitch
>
>
>
>  On Wed, May 7, 2008 at 7:53 AM, Jonathan Rochkind <[EMAIL PROTECTED]> wrote:
>  > This is interesting. These slides don't give me quite enough info to
>  >  figure out what's going on (I hate reading slides by themselves!), but
>  >  I'm curious about this statement: "Without JavaScript coding
>  >  (even though Google's API requires JavaScript coding as it is) ". Are
>  >  you making calls server-side, or are you still making them client-side?
>  >
>  >  As you may recall, one issue I keep beating upon is the desire to call
>  >  Google's API server-side. While it's technically possible to call it
>  >  server-side, Google doesn't want you to. I wonder if this is what
>  >  they're doing there? The problems with that are:
>  >
>  >  1) It may violate Googles terms of service
>  >  2) It may run up against Google traffic-limiting defenses
>  >  3) [Google's given reason]: It doesn't allow Google to tailor the
>  >  results to the end-users location (determined by IP).
>  >
>  >  Including an x-forwarded-for header _may_ get around #2 or #3. Including
>  >  an x-forwarded-for header should probably be considered a best practice
>  >  when doing this sort of thing server-side in general, but I'm still
>  >  nervous about doing this, and wish that Google would just plain say they
>  >  allow server-side calls.
>  >
>  >
>  >
>  >
>  >
>  >  Godmar Back wrote:
>  >
>  > > Hi,
>  > >
>  > > here's a pointer to follow up on the earlier discussion on how to
>  > > integrate Google books viewability API into closed legacy systems that
>  > > allow only limited control regarding what is being output, such as
>  > > III's Millennium system. Compared to other solutions, no JavaScript
>  > > programming is required, and the integration into the vendor-provided
>  > > templates (such as briefcit.html etc.) is reasonably clean, provides
>  > > targeted placement, and allows for multiple uses per page.
>  > >
>  > > Slides (excerpted from Annette Bailey's presentation at IUG 2008):
>  > > http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt
>  > > A demo is currently available here: http://addison.vt.edu:2082/
>  > >
>  > >  - Godmar
>  > >
>  > >
>  > >
>  >
>  >  --
>  >  Jonathan Rochkind
>  >  Digital Services Software Engineer
>  >  The Sheridan Libraries
>  >  Johns Hopkins University
>  >  410.516.8886
>  >  rochkind (at) jhu.edu
>  >
>


[CODE4LIB] coverage of google book viewability API

2008-05-06 Thread Godmar Back
Hi,

to examine the usability of Google's book viewability API when lookup
is done via ISBN, we did some experiments, the results of which I'd
like to share. [1]

For 1000 randomly drawn ISBN from 3,192,809 ISBN extracted from a
snapshot of LoC's records [2], Google Books returned results for 852
ISBN.  We then downloaded the page that was referred to in the
"info_url" parameter of the response (which is the "About" page Google
provides) for each result.

To examine whether Google retrieved the correct book, we checked if
the Info page contained the ISBN for which we'd searched. 815 out of
852 contained the same ISBN. 37 results referred to a different ISBN
than the one searched for.  We examined the 37 results manually: 33
referred to a different edition of the book whose ISBN was used to
search, as judged by comparing author/title information with OCLC's
xISBN service. (We compared the author/title returned by xISBN with
the author/title listed on Google's book information page.)  4 records
appeared to be misindexed.

I found the results (85.2% recall and >99% precision, if you allow for
the ISBN substitution; with a 3.1% margin of error) surprisingly high.

 - Godmar

[1] http://top.cs.vt.edu/~gback/gbs-accuracy-study/
[2] http://www.archive.org/details/marc_records_scriblio_net


[CODE4LIB] google books for III millennium

2008-05-06 Thread Godmar Back
Hi,

here's a pointer to follow up on the earlier discussion on how to
integrate Google books viewability API into closed legacy systems that
allow only limited control regarding what is being output, such as
III's Millennium system. Compared to other solutions, no JavaScript
programming is required, and the integration into the vendor-provided
templates (such as briefcit.html etc.) is reasonably clean, provides
targeted placement, and allows for multiple uses per page.

Slides (excerpted from Annette Bailey's presentation at IUG 2008):
http://libx.org/gbs/GBSExcerptFromIUGTalk2008.ppt
A demo is currently available here: http://addison.vt.edu:2082/

 - Godmar


Re: [CODE4LIB] how to obtain a sampling of ISBNs

2008-04-29 Thread Godmar Back
Thank you all for the replies.

To summarize:

- Tim Spalding offered LibraryThing's database at
http://www.librarything.com/wiki/index.php/LibraryThing_APIs
- Roy Tennant pointed at MIT's Barton dump: available at
<http://simile.mit.edu/rdf-test-data/>

but the winner is probably this python script based on Ed's suggestion:

-
#!/usr/bin/python

from urllib import urlopen
from pymarc import MARCReader

locrecordspattern =
'http://www.archive.org/download/marc_records_scriblio_net/part%02d.dat'

for part in range(1, 30):
for record in MARCReader(urlopen(locrecordspattern % part)):
if record['020'] and record['020']['a']:
print record['020']['a']
--

Now if I could only figure out how to install "easy_install" on FC8 so
I didn't have to run it with:
env PYTHONPATH=`pwd`/pymarc-2.21 ./readloc.py

 - Godmar

On Tue, Apr 29, 2008 at 8:20 AM, Ed Summers <[EMAIL PROTECTED]> wrote:
> You could download a snapshot of the full LC back file at the Internet
>  Archive (kindly donated by Scriblio).
>
>   http://www.archive.org/details/marc_records_scriblio_net
>
>  Then run a script using your favorite MARC parsing library (mine
>  currently is pymarc):
>
>   from pymarc import MARCReader
>
>   for record in MARCReader(file('part01.dat')):
>   if record['020'] and record['020']['a']:
>   print record['020']['a']
>
>  //Ed
>
>
>
>  On Mon, Apr 28, 2008 at 9:35 AM, Godmar Back <[EMAIL PROTECTED]> wrote:
>  > Hi,
>  >
>  >  for an investigation/study, I'm looking to obtain a representative
>  >  sample set (say a few hundreds) of ISBNs. For instance, the sample
>  >  could represent LoC's holdings (or some other acceptable/meaningful
>  >  population in the library world).
>  >
>  >  Does anybody have any pointers/ideas on how I might go about this?
>  >
>  >  Thanks!
>  >
>  >   - Godmar
>  >
>


Re: [CODE4LIB] how to obtain a sampling of ISBNs

2008-04-28 Thread Godmar Back
Hi,

thanks to everybody who's replied with offers to provide ISBNs.

I need to clarify that I'm looking for a sample of ISBNs that is
representative of some larger population, such as "all books cataloged
by LoC", or "all books in library X's catalog", or "all books sold by
Amazon."

It could be, for instance, a simple random sample [1].

What will not work are ISBNs coming from a FRBR service, from a
specialized collections, or the "first n ISBNs" coming from a catalog
dump (unless that order in which the catalog database is dumped is
explicitly random).

 - Godmar

[1] http://en.wikipedia.org/wiki/Simple_random_sample

On Mon, Apr 28, 2008 at 10:40 AM, Shanley-Roberts, Ross A. Mr.
<[EMAIL PROTECTED]> wrote:
> I could give you any number of sets of isbns. What kind of material are you 
> interested in: videos, books, poetry, electronic resources, etc., or I could 
> supply a set of isbns for any subject area or LC classification area that you 
> might be interested in.
>
>  Ross
>
>
>  Ross Shanley-Roberts
>  Special Projects Technologist
>  Miami University Libraries
>  Oxford, OH 45056
>  [EMAIL PROTECTED]
>  847 672-9609
>  847 894-3911 cell
>
>
>
>
>  -Original Message-
>  From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Godmar Back
>  Sent: Monday, April 28, 2008 8:35 AM
>  To: CODE4LIB@LISTSERV.ND.EDU
>  Subject: [CODE4LIB] how to obtain a sampling of ISBNs
>
>
>
> Hi,
>
>  for an investigation/study, I'm looking to obtain a representative
>  sample set (say a few hundreds) of ISBNs. For instance, the sample
>  could represent LoC's holdings (or some other acceptable/meaningful
>  population in the library world).
>
>  Does anybody have any pointers/ideas on how I might go about this?
>
>  Thanks!
>
>   - Godmar
>


[CODE4LIB] how to obtain a sampling of ISBNs

2008-04-28 Thread Godmar Back
Hi,

for an investigation/study, I'm looking to obtain a representative
sample set (say a few hundreds) of ISBNs. For instance, the sample
could represent LoC's holdings (or some other acceptable/meaningful
population in the library world).

Does anybody have any pointers/ideas on how I might go about this?

Thanks!

 - Godmar


  1   2   >