[CODE4LIB] PREMIS question

2010-03-04 Thread Tim Shearer

Hi folks,

Ignoring, for the moment, the utility of doing so...has anyone written 
(or does anyone know of) an xsl tranform from PREMIS to HTML?


I'm finding PREMIS transforms, but nothing that produces output for 
consumption in a webpage.


The idea is to let folks more easily parse the PREMIS information for 
objects in an IR.  That is to say, not make them parse the xml directly.


I do have a request into LC but no response yet.

Even pointers to a suspect/contact would be welcome.

[As an aside, if you've not attended the conference, I just went to my 
first and it was pound for pound the best one I've attended.  So you 
need to find a way to go.]


Thanks,
Tim


Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-03 Thread Tim Shearer

A big old thank you to OCLC for the support!  It is deeply appreciated.

-t

On 3/3/10 10:34 AM, Roy Tennant wrote:

On 3/3/10 3/3/10 € 7:22 AM, Ross Singerrossfsin...@gmail.com  wrote:


On Wed, Mar 3, 2010 at 9:55 AM, Paul Josephpjjos...@gmail.com  wrote:

No need to be concerned about the vendors: they're the same suspects who
sponsored C4L10.


Just to be clear on this -- the same suspects actually shelled out far
less for C4L10 than they had in the past.


Just to clarify the clarification, OCLC continued our support at the highest
level this year, as we have since the conference began.
Roy


[CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with 
diacritics in a rendered web page regardless of whether or not the user 
tries with or without diacritics.


In indexes this is usually solved by indexing the word with and without, 
so the user gets what they want regardless of how they search.


Thanks in advance for any ideas/enlightenment,
Tim


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Are you are referring to a find in page, where a user presses CTRL-F
in the browser?


Yes, sorry to be unclear.


If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type placa and
it matches pla�a, and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.


Exactly, and FF and IE are the most common browsers we're seeing.

I was wondering if someone (I know this sounds crazy) has explored the 
idea of marking up the non-diacritic inline version of the word in a span 
styled in such a way as to make it findable but not intrusive.


-t


Keith


On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearersh...@ils.unc.edu wrote:

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with
diacritics in a rendered web page regardless of whether or not the user
tries with or without diacritics.

In indexes this is usually solved by indexing the word with and without, so
the user gets what they want regardless of how they search.

Thanks in advance for any ideas/enlightenment,
Tim



[CODE4LIB] OCA API

2009-05-15 Thread Tim Shearer

Hi Folks,

The University Library at UNC-Chapel Hill has created an OCA API.  We have 
harvested (and continue to harvest) standard bibliographic identifiers and 
link them to OCA identifiers.  The API is deliberately modeled after 
Google's for ease of implementation.


Here is a subjec search in UNC's catalog for North Carolina limited to 
the 19th century.


http://search.lib.unc.edu/search?Ntk=SubjectNe=2+200043+206475+206590+11N=206596Ntt=north%20carolina

You will see links to OCA as well as Google.  (The full record has an OCA 
icon if you want to look.)  Right now we are only banging against the API 
with OCLC numbers, but ISSNs, ISBNs and LC numbers are in there.


We are looking for a couple of partners to work with to take use beyond 
our local OPAC.  You would be ideal if: you are interested, you already 
use the Google API, you have a significant corpus of pre-1923 works in 
your catalog.


As the Google API is familiar to many of you, it would be easy to figure 
out how to implement UNC's without working with us.  Please hold off until 
we are ready to open it up all the way? This is why we've not yet put up 
documentation.


Caveats and other notes (feel free to skip):

*We realize that Open Library has an API, but we had already gone a goodly 
distance and we are finding relatively meaningful differences in coverage 
and utility.


*We collect the data from OCA as it comes in (the data should be up to 
date within a half hour or so)...but they occasionally have need to 
correct/remove works.  Right now we are actively working on this issue, 
but do not yet have a great mechanism to pull deletes and update corrected 
identifiers.


*The data is only as good as the data we harvest.  There are a small 
number of bad links.  See above.


*Excerpt from a developer on UNC's holdings (we are an OCA Scribe site):

...I decided to run the same script against the [production] database as 
well to see how much the matching is changing over time with continual 
updates:

- 429311 OCLC's tested
- 72350 matched
- 2599 of the matches were scanned by UNC

So that's 808 new matches since the end of March, not too bad for one 
month.


Effectively we are now linking to ~72 K digitized works that we were not 
previously able to provide (though as Google digitized books are being 
added to OCA, there is significant overlap).


*When we do open it up it is the API we are offering, we are not prepared 
to be crawled for data.  If you want the data, get in touch and we will 
see what we can do.


If you are interested in being an early partner, please drop me a line and 
I will be in touch.


Tim

+++
Tim Shearer

Web Development Coordinator
The University Library
University of North Carolina at Chapel Hill
sh...@ils.unc.edu
919-962-1288
+++


[CODE4LIB] amazon s3?

2008-11-10 Thread Tim Shearer

Hi Folks,

Anybody doing mass storage for their library/consortium on amazon s3?

Anybody rejected it as an idea?

Willing to share?  Please do.

Tim

+++
Tim Shearer

Web Development Coordinator
The University Library
University of North Carolina at Chapel Hill
[EMAIL PROTECTED]
919-962-1288
+++


Re: [CODE4LIB] creating call number browse

2008-09-30 Thread Tim Shearer

Owen,

Unless I'm misunderstanding, what's being asked for is a visualization 
tool for the *classification*.  Faceted browsing by subject is dandy, but 
is not at all the same thing (though arguments can be made that the lines 
are blurring).  Books that sit next to each other in a classification (DC 
or LC, or whatever) may not share a majority of subject terms.  That 
collocation via classification is yet another (and occasionally more 
useful) way of saying that this item is like that item.  One that is not 
necessarily trapped in any other way than call number.


-t

On Tue, 30 Sep 2008, Stephens, Owen wrote:


I'd second Steve's comments - replicating an inherently limited physical
browse system seems an odd thing to do in the virtual world. I would
have thought that the 'faceted browse' function we are now seeing
appearing in library systems (of course, the Endeca implementation is a
leader here) is potentially the virtual equivalent of 'browsing the
shelves', but hopefully without the limitations that the physical
environment brings?

Is it the UI rather than the functionality that is lacking here? Perhaps
we need to look more carefully at the 'browsing' experience. Thinking
about examples outside the library world, I personally like the
'coverflow' browse in iTunes, but I'm able to sort tracks by several
criteria and still see a coverflow view. I have to admit that in general
I prefer the 'album' order when using coverflow, because otherwise it
doesn't make sense (to me that is). It would be interesting to look at
what an 'artistflow' might look like, or a 'genreflow'.

However, as far as I know I can't actually replicate the experience that
I would have with my (now in boxes somewhere) physical CD collection -
why was divided by genre, then sorted by artist surname (ok, I admit it,
I'm a librarian through and through)

Perhaps a better understanding of the 'browse' experience is needed?

Some questions - when we browse:

When and why do people browse rather than search?
How do people make decisions about useful items as they browse?
Browsing stacks suggests that items have been 'ordered' - is there
something about this that appeals? Does it convey 'authority' in some
way that the 'any order you want' doesn't?

Owen

Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ

t: +44 (0)20 7594 8829
e: [EMAIL PROTECTED]

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf

Of

Steve Meyer
Sent: 29 September 2008 21:45
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] creating call number browse

one counter argument that i would make to this is that we consistently
hear from faculty that they absolutely adore browsing the

stacks--there

is something that they have learned to love about the experience
regardless of whether they understand that it is made possible by the
work of catalogers assigning call numbers and then using them for
ordering the stacks.

at uw-madison we have a faculty lecture series where we invite
professors to talk about their use of library materials and their
research and one historian said outright, the one thing that is

missing

in the online environment is the experience of browsing the stacks. he
seemed to understand that with all the mass digitization efforts, we
could be on the edge of accomplishing it.

that said, i agree that we should do what you say also, just that we
should not throw the baby out w/ the bath water. if faculty somehow
understand that browsing the stacks is a good experience then we can
use
it as a metaphor in the online environment. in an unofficial project i
have experimented w/ primitive interface tests using both subject
heading 'more like this' and a link to a stack browse based on a call
number sort:

http://j2ee-dev.library.wisc.edu/sanecat/item.html?resourceId=951506

(please, ignore the sloppy import problems, i just didn't care that
much
for the interface test)

as for the original question, this has about a million records and
900,000 w/ item numbers and a simple btree index in the database sorts
at an acceptable speed for a development test.

-sm

Walker, David wrote:

a decent UI is probably going to be a bigger job


I've always felt that the call number browse was a really useful

option, but the most disastrously implemented feature in most ILS
catalog interfaces.


I think the problem is that we're focusing on the task -- browsing

the shelf -- as opposed to the *goal*, which is, I think, simply to
show users books that are related to the one they are looking at.


If you treat it like that (here are books that are related to this

book) and dispense with the notion of call numbers and shelves in the
interface (even if what you're doing behind the scenes is in fact a
call number browse) then I think you can arrive at a much simpler and
straight-forward UI for users.  I would treat it little different 

Re: [CODE4LIB] LOC Authority Data

2008-09-29 Thread Tim Shearer

Socialized medicine?  Sure.  *We* have authority files!

-t

On Tue, 23 Sep 2008, David Fiander wrote:


One of the most important pages in the print volumes of the Library of
Congress Subject Headings (LCSH), is the title page verso, which
includes publication and copyright details. The folks at LC very
clearly understand US copyright law, since on that page you can see
that they claim that the LCSH is copyright LC _outside of the United
States of America_.

The same probably holds true for the copyright claim on the name
authority files. You folks in the United States can do what you will
with impunity, but us unwashed masses beyond your shores are likely to
get in trouble. Probably the next time we attempt to cross the border.

- David

On Tue, Sep 23, 2008 at 5:21 PM, Jason Griffey [EMAIL PROTECTED] wrote:

As I mentioned, they are available from Ibiblio on the link above. The
copyright claim is...well...specious at best. But no one really wants
to be the one to go to court and prove it. They've been publicly
available for more than a year now on the Fred 2.0 site, and they
haven't been sued, to my knowledge.

Jason


On Tue, Sep 23, 2008 at 5:17 PM, Nate Vack [EMAIL PROTECTED] wrote:

On Tue, Sep 23, 2008 at 3:49 PM, Bryan Baldus
[EMAIL PROTECTED] wrote:


One way (as you likely know) (official, expensive) is via The Library of 
Congress Cataloging Distribution Service:


Huh. They claim copyright of these records. I'd somehow thought:

1: The federal government can't hold copyrights

2: As purely factual data, catalog records are conceptually uncopyrightable

Anyone who knows more about this than I do know if they're *really*
copyrighted, or if it's more of a we're gonna try and say they're
copyrighted and hope no one ignores us?

Curious,
-Nate







Re: [CODE4LIB] creating call number browse

2008-09-21 Thread Tim Shearer

Hi,

One approach to the UI might be to use Cooliris (was piclens) and generate 
a media rss file in call number order.  It's limited (to people who have 
installed cooliris) but it's essentially a coverflow.   You can do other 
things within the browser, but few are going to feel as immediate and 
tranparent to the user.


Again, maybe not for all users, but maybe a cool enhanced version for a 
subset.


Generating that media rss file may get tricky (you need uris to thumbs and 
fulls) depending on the API from, and agreements with syndetics.


-t

On Wed, 17 Sep 2008, Charles Antoine Julien, Mr wrote:


I've done some work this.


 What I don't know is whether there are any indexing / SQL / query techniques that 
could be used to browse forward and backword in an index like this.

Depending on what you want to do exactly, yes.  Look at

Querying Ontologies in Relational Database Systems - ►hu-berlin.de [PDF]
S Trissl, U Leser - LECTURE NOTES IN COMPUTER SCIENCE, 2005 - Springer

If you need more you're looking at CS literature concerning treatment of 
graphs, directed graphs, cyclical, transitive closure, etc.

This can all be done without to much difficulty but as Nate pointed out 
updating the data is a problem...I've not tackled that part but there is much 
literature on dynamic graphs and I'm assuming this could also be adequately 
solved.


a decent UI is probably going to be a bigger job


Yes, that's the real issue.  Could call numbers be placed within a hierarchy?  Then 
display this in an outline view (Windows Explorer) that is also item searchable?  Seems 
to me there is structure in the call numbers that is hidden in current UIs.  I also think 
the actual Call number should disappear and replaced by a textual label 
describing what the numbers mean.

Fun stuff to think about...

Charles-Antoine


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Emily Lynema
Sent: September 17, 2008 11:46 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] creating call number browse

Hey all,

I would love to tackle the issue of creating a really cool call number
browse tool that utilizes book covers, etc. However, I'd like to do this
outside of my ILS/OPAC. What I don't know is whether there are any
indexing / SQL / query techniques that could be used to browse forward
and backword in an index like this.

Has anyone else worked on developing a tool like this outside of the
OPAC? I guess I would be perfectly happy even if it was something I
could build directly on top of the ILS database and its indexes (we use
SirsiDynix Unicorn).

I wanted to throw a feeler out there before trying to dream up some wild
scheme on my own.

-emily

P.S. The version of BiblioCommons released at Oakville Public Library
has a sweet call number browse function accessible from the full record
page. I would love to know know how that was accomplished.

http://opl.bibliocommons.com/item/show/1413841_mars

--
Emily Lynema
Systems Librarian for Digital Projects
Information Technology, NCSU Libraries
919-513-8031
[EMAIL PROTECTED]

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.169 / Virus Database: 270.6.21/1674 - Release Date: 17/09/2008 
9:33 AM


Re: [CODE4LIB] what's friendlier less powerful than phpMyAdmin?

2008-08-10 Thread Tim Shearer

Hi All,

It ain't free, but there's a lovely client for mysql called navicat 
(http://www.navicat.com/) that we've been using.  And even though I *can* 
do command line queries, gotta say I love pulling lines between tables to 
set them up.  It's not too expensive and I find that for light to medium 
weight stuff it's fun and easy to use.


-t


On Wed, 30 Jul 2008, Eric Lease Morgan wrote:


On Jul 30, 2008, at 1:47 PM, Cloutman, David wrote:


Perhaps you should put together some MySQL training materials for
librarians. A webinar, perhaps. I'd love it if my colleagues had those
skills. I don't think there is that much interest, but I could be wrong.
There are at least 101 ways enterprise level database skills could be
put to work in my library. I'm pretty sick of our core technical
solutions being Excel spreadsheets and the occasional Access database.
Blech.




Tell me about it, and besides, basic SQL is not any more difficult than CCL. 
SELECT this FROM that WHERE field LIKE %foo%  Moreover, IMHO, relational 
databases are the technological bread  butter of librarianship these days. 
Blissful ignorance does the profession little good.


--
Eric Lease Morgan
Hesburgh Libraries, University of Notre Dame


Re: [CODE4LIB] KR

2008-04-03 Thread Tim Shearer

So now I have to compile my jokes?

-t

On Thu, 3 Apr 2008, Ryan Ordway wrote:


#include stdio.h
main(t,_,a)
char *a;
{
return!0t?t3?main(-79,-13,a+main(-87,1-_,main(-86,0,a+1)+a)):
1,t_?main(t+1,_,a):3,main(-94,-27+t,a)t==2?_13?
main(2,_+1,%s %d %d\n):9:16:t0?t-72?main(_,t,
@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l+,/n{n+,/+#n
+,/#\
;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \
q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw'
i;# \
){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \
iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \
;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}
{rl#'{n' ')# \
}'+}##(!!/)
:t-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1)
:0t?main(2,2,%s):*a=='/'||main(0,main(-61,*a,
!ek;dc [EMAIL PROTECTED]'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m 
.vpbks,fxntdCeghiry),a+1);
}



On Apr 3, 2008, at 8:54 AM, Jeremy Frumkin wrote:

..- .-.. .-..   .. .. --   --. --- .. -. --.   - ---   ... .-
-.--   .-
-... --- ..- -   -  .. ...   -  .-. . .- -..   .. ...
-  .- -
-. --- -. .   --- ..-.   -.--
--- ..-   ... ..- ..-. ..-. . .-.   ..-. .-.
--- --   .-. -- ..   -  .   .-- .- -.--   ..   -..
---   .--  . -.
..   ..- ... .   -- -.--   .--. .-. . ..-. . .-. .-. . -..   ..
-. .--. ..-
-   -.. . ...- .. -.-. . .-.-.- .-.-.- .-.-.-

-- --   .--- .- ..-.


On 4/3/08 6:51 AM, Walter Lewis [EMAIL PROTECTED] wrote:


Sebastian Hammer wrote:

A true hacker has no need for these crude tools. He waits for
cosmic
radiation to pummel the magnetic patterns on his drive into a
pleasing
and functional sequence of bits.

Alas, having been doing this (along with my partners, the four
Yorkshiremen) since the Stone Age ...

We used to arrange pebbles in the middle of road into the relevant
patterns (we *dreamed* of being able to afford the wire for an
abacus).
Passing carts would then help crunch the numbers.

Walter
  for whom graph paper, templates, pencils, 80 column punchcards and
IBM Assembler were formative experiences





===
Jeremy Frumkin
Head, Emerging Technologies and Services
121 The Valley Library, Oregon State University
Corvallis OR 97331-4501

[EMAIL PROTECTED]

541.602.4905
541.737.3453 (Fax)
===
 Without ambition one starts nothing. Without work one finishes
nothing. 
- Emerson



--
Ryan Ordway   E-mail: [EMAIL PROTECTED]
Unix Systems Administrator   [EMAIL PROTECTED]
OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657


Re: [CODE4LIB] dict protocol

2008-03-31 Thread Tim Shearer

Hi Eric,

Given the likely need to map back from an alternate name (string search in
the definition?) to the auth name (maybe the most common use for such a
service?), I think this route might be on the inefficient side.

I've been wondering about names as handles, with a crossref-like middleman
piece.  But not doing anything about such ideas.

-t

On Mon, 31 Mar 2008, Eric Lease Morgan wrote:



Over the weekend I had fun with the DICT protocol, a DICT server, a
DICT client, and the creation of dictionaries for the afore mentioned.

The DICT protocol seems to be a simple client/server protocol for
searching remote content and returning definitions of the query.
[1] I was initially drawn to the protocol for its content.
Specifically, I wanted a dictionary because I thought it would be
useful in a next generation library catalog application. The server
was trivial to install because it is available via yum. Since it is
protocol there are a number of clients and libraries available.
There's also bunches o' data to be had, albeit a bit dated. Some of
it includes: 1913 dictionary, version 2.0 of WordNet, the CIA World
Fact Book (2000), Moby's Thesaurus, a gazetteer, and quite a number
of English to other dictionaries.

What's interesting is the DICT protocol data is not limited to
dictionaries as the Fact Book exemplifies. The data really only has
two fields: headword (key), and note (definition). After thinking
about it, I thought authority lists would be a pretty good candidate
for DICT. The headword would be the term, and the definition would be
the See From and See Also listings.

Off on an adventure, I downloaded subject authorities from FRED. [2]
I used a shell script to loop through my data (subjects2dictd,
attached) which employed XSLT to parse the MARCXML
(subjects2dict.xsl, attached) and then ran various dict* utilities.
The end result is a dictionary query-able with your favorite DICT
client. From a Linux shell, try:

dict -h 208.81.177.118 -d subjects -s substring blues

While I think this is pretty kewl, I wonder whether or not DICT is
the correct approach. Maybe I should use a more robust, full-text
indexer for this problem? After all, DICT servers only look at the
headword when searching, not the definitions. On the other hand DICT
was *pretty* easy to get up an running, and authority lists are a
type of dictionary.

[1] http://www.dict.org
[2] http://www.ibiblio.org/fred2.0/authorities/

--
Eric Lease Morgan
University Libraries of Notre Dame


subjects2dictd
Description: Binary data





subjects2dict.xsl
Description: Binary data





[CODE4LIB] musing on oca apiRe: [CODE4LIB] oca api?

2008-03-06 Thread Tim Shearer
%20Library)limit=10submit=submit

This is returning scanned items from the biodiversity collection,
updated between 10/31/2007 - 11/30/2007, restricted to one of our
contributing libraries (MBLWHOI Library), and limited to 10 results.

The results are styled in the browser; view source to see the good
stuff.  We use this list to grab the identifiers we've yet to ingest.

Some background: When a book is scanned through IA/OCA scanning, they
create their own unique identifier (like annalesacademiae21univ) and
grab a MARC record from the contributing library's catalog.  All of

the

scanned files, derivatives, and metadata files are stored on IA's
clusters in a directory named with the identifier.

Steve mentioned using their /details/ directive, then sniffing the

page

to get the cluster location and the files for downloading.  An easier
method is to use their /download/ directive, as in:

http://www.archive.org/download/ID$, or in the example above:
http://www.archive.org/download/annalesacademiae21univ

That automatically does a lookup on the cluster, which means you don't
have to scrape info off pages.  You can also address any files within
that directory, as in:


http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2

1univ_marc.xml

The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
these scanned books is to grab them out of the MARC record.  So the
long-winded answer to your question, Tim, is no, there's no simple way
to crossref what IA has scanned with your catalog - THAT I KNOW OF.

Big

caveat on that last part.

Happy to help with any other questions I can,

Chris Freeland


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf

Of

Steve Toub
Sent: Sunday, February 24, 2008 11:20 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

--- Tim Shearer [EMAIL PROTECTED] wrote:


Hi Folks,

I'm looking into tapping the texts in the Open Content Alliance.

A few questions...

As near as I can tell, they don't expose (perhaps even store?) any

common

unique identifiers (oclc number, issn, isbn, loc number).


I poked around in this world a few months ago in my previous job at
California Digital Library,
also an OCA partner.

The unique key seems to be text string identifier (one that seems to

be

completely different from
the text string identifier in Open Library). Apparently there was talk
at the last partner meeting
about moving to ISBNs:


http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a

lliance/

To obtain identifiers in bulk, I think the recommended approach is the
OAI-PMH interface, which
seems more reliable in recent months:

http://www.archive.org/services/oai.php?verb=Identify



http://www.archive.org/services/oai.php?verb=ListIdentifiersmetadataPre

fix=oai_dcset=collection:cdl

etc.


Additional instructions if you want to grab the content files.

From any book's metadata page (e.g.,
http://www.archive.org/details/chemicallecturee00newtrich)
click through on the Usage Rights: See Terms link; the rights are on

a

pane on the left-hand
side.

Once you know the identifier, you can grab the content files, using

this

syntax:
http://www.archive.org/details/$ID
Like so:
http://www.archive.org/details/chemicallecturee00newtrich

And then sniff the page to find the FTP link:
ftp://ia340915.us.archive.org/2/items/chemicallecturee00newtrich

But I think they prefer to use HTTP for these, not the FTP, so switch
this to:
http://ia340915.us.archive.org/2/items/chemicallecturee00newtrich

Hope this helps!

  --SET



We're a contributer so I can use curl to grab our records via http

(and

regexp my way to our local catalog identifiers, which they do
store/expose).

I've played a bit with the z39.50 interface at indexdata
(http://www.indexdata.dk/opencontent/), but I'm not confident about

the

content behind it.  I get very limited results, for instance I can't

find

any UNC records and we're fairly new to the game.

Again, I'm looking for unique identifiers in what I can get back and

it's

slim pickings.

Anyone cracked this nut?  Got any life lessons for me?

Thanks!
Tim

+++
Tim Shearer

Web Development Coordinator
The University Library
University of North Carolina at Chapel Hill
[EMAIL PROTECTED]
919-962-1288
+++







Re: [CODE4LIB] oca api?

2008-02-27 Thread Tim Shearer
 record from the contributing library's catalog.  All of


the


scanned files, derivatives, and metadata files are stored on IA's
clusters in a directory named with the identifier.

Steve mentioned using their /details/ directive, then sniffing the


page


to get the cluster location and the files for downloading.  An easier
method is to use their /download/ directive, as in:

http://www.archive.org/download/ID$, or in the example above:
http://www.archive.org/download/annalesacademiae21univ

That automatically does a lookup on the cluster, which means you don't
have to scrape info off pages.  You can also address any files within
that directory, as in:



http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2


1univ_marc.xml

The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
these scanned books is to grab them out of the MARC record.  So the
long-winded answer to your question, Tim, is no, there's no simple way
to crossref what IA has scanned with your catalog - THAT I KNOW OF.


Big


caveat on that last part.

Happy to help with any other questions I can,

Chris Freeland


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf


Of


Steve Toub
Sent: Sunday, February 24, 2008 11:20 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

--- Tim Shearer [EMAIL PROTECTED] wrote:



Hi Folks,

I'm looking into tapping the texts in the Open Content Alliance.

A few questions...

As near as I can tell, they don't expose (perhaps even store?) any


common


unique identifiers (oclc number, issn, isbn, loc number).


I poked around in this world a few months ago in my previous job at
California Digital Library,
also an OCA partner.

The unique key seems to be text string identifier (one that seems to


be


completely different from
the text string identifier in Open Library). Apparently there was talk
at the last partner meeting
about moving to ISBNs:



http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a


lliance/

To obtain identifiers in bulk, I think the recommended approach is the
OAI-PMH interface, which
seems more reliable in recent months:

http://www.archive.org/services/oai.php?verb=Identify




http://www.archive.org/services/oai.php?verb=ListIdentifiersmetadataPre


fix=oai_dcset=collection:cdl

etc.


Additional instructions if you want to grab the content files.

From any book's metadata page (e.g.,
http://www.archive.org/details/chemicallecturee00newtrich)
click through on the Usage Rights: See Terms link; the rights are on


a


pane on the left-hand
side.

Once you know the identifier, you can grab the content files, using


this


syntax:
http://www.archive.org/details/$ID
Like so:
http://www.archive.org/details/chemicallecturee00newtrich

And then sniff the page to find the FTP link:
ftp://ia340915.us.archive.org/2/items/chemicallecturee00newtrich

But I think they prefer to use HTTP for these, not the FTP, so switch
this to:
http://ia340915.us.archive.org/2/items/chemicallecturee00newtrich

Hope this helps!

  --SET




We're a contributer so I can use curl to grab our records via http


(and


regexp my way to our local catalog identifiers, which they do
store/expose).

I've played a bit with the z39.50 interface at indexdata
(http://www.indexdata.dk/opencontent/), but I'm not confident about


the


content behind it.  I get very limited results, for instance I can't


find


any UNC records and we're fairly new to the game.

Again, I'm looking for unique identifiers in what I can get back and


it's


slim pickings.

Anyone cracked this nut?  Got any life lessons for me?

Thanks!
Tim

+++
Tim Shearer

Web Development Coordinator
The University Library
University of North Carolina at Chapel Hill
[EMAIL PROTECTED]
919-962-1288
+++




--




--
Sebastian Hammer, Index Data
[EMAIL PROTECTED]   www.indexdata.com
Ph: (603) 209-6853 Fax: (866) 383-4485



Re: [CODE4LIB] oca api?

2008-02-25 Thread Tim Shearer

Yup,

Chris' email was exactly what I was hoping for.  Now if there were a nice
way to pre-screen for records that don't have empty (isbn|issn|oclc#)
without all the work of looking per record (and the overhead for the
server, and the overhead if more than one organization starts to do this).

I guess I want to search for uniqueID != NULL and only get their unique id
back, and script from there.

Still and all, this now seems a very doable thing.

Chris, many thanks!
-t

On Mon, 25 Feb 2008, Tennant,Roy wrote:


Well, from where Chris left off it would be fairly easy to check for a
file in the directory with an marc.xml filename extension, then XSLT
for:

datafield tag=010 ind1=  ind2= 
subfield code=a39004822/subfield
/datafield

If such exists, and then you'll have the ISBN. To sweeten it further,
send that into xISBN or ThingISBN and get other ISBNs for the same work.
This seems completely scriptable to me. Perhaps someone at c4l will have
it done before the conference is over. And Tim, the example above is one
that's in your catalog.
Roy

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Chris Freeland
Sent: Monday, February 25, 2008 11:51 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

Steve  Tim,

I'm the tech director for the Biodiversity Heritage Library (BHL), which
is a consortium of 10 natural history libraries who have partnered with
Internet Archive (IA)/OCA for scanning our collections.  We've just
launched our revamped portal, complete with more than 7,500 books  2.8
million pages scanned by IA  other digitization partners, at:
http://www.biodiversitylibrary.org

To build this portal we ingest metadata from IA.  We found their OAI
interface to pull scanned items inconsistently based on date of
scanning, so we switched to using their custom query interface.  Here's
an example of a query we fire off:

http://www.archive.org/services/search.php?query=collection:(biodiversit
y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH
OI%20Library)limit=10submit=submit

This is returning scanned items from the biodiversity collection,
updated between 10/31/2007 - 11/30/2007, restricted to one of our
contributing libraries (MBLWHOI Library), and limited to 10 results.

The results are styled in the browser; view source to see the good
stuff.  We use this list to grab the identifiers we've yet to ingest.

Some background: When a book is scanned through IA/OCA scanning, they
create their own unique identifier (like annalesacademiae21univ) and
grab a MARC record from the contributing library's catalog.  All of the
scanned files, derivatives, and metadata files are stored on IA's
clusters in a directory named with the identifier.

Steve mentioned using their /details/ directive, then sniffing the page
to get the cluster location and the files for downloading.  An easier
method is to use their /download/ directive, as in:

http://www.archive.org/download/ID$, or in the example above:
http://www.archive.org/download/annalesacademiae21univ

That automatically does a lookup on the cluster, which means you don't
have to scrape info off pages.  You can also address any files within
that directory, as in:
http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2
1univ_marc.xml

The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
these scanned books is to grab them out of the MARC record.  So the
long-winded answer to your question, Tim, is no, there's no simple way
to crossref what IA has scanned with your catalog - THAT I KNOW OF.  Big
caveat on that last part.

Happy to help with any other questions I can,

Chris Freeland


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Steve Toub
Sent: Sunday, February 24, 2008 11:20 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

--- Tim Shearer [EMAIL PROTECTED] wrote:


Hi Folks,

I'm looking into tapping the texts in the Open Content Alliance.

A few questions...

As near as I can tell, they don't expose (perhaps even store?) any

common

unique identifiers (oclc number, issn, isbn, loc number).


I poked around in this world a few months ago in my previous job at
California Digital Library, also an OCA partner.

The unique key seems to be text string identifier (one that seems to be
completely different from the text string identifier in Open Library).
Apparently there was talk at the last partner meeting about moving to
ISBNs:
http://dilettantes.code4lib.org/2007/10/22/tales-from-the-open-content-a
lliance/

To obtain identifiers in bulk, I think the recommended approach is the
OAI-PMH interface, which seems more reliable in recent months:

http://www.archive.org/services/oai.php?verb=Identify

http://www.archive.org/services/oai.php?verb=ListIdentifiersmetadataPre
fix=oai_dcset=collection:cdl

etc.


Additional instructions if you want to grab the content files.


From any book's metadata page (e.g

Re: [CODE4LIB] Library Software Manifesto

2007-11-06 Thread Tim Shearer

Hi Roy,

Not sure how to make this succinct enough to be elegant (i.e. a bullet
point) but...

We have a large enough staff to break into software when necessary.  A
typical scenario is:

We need a feature added (or bug removed) to make workflow tenable
We request the feature (bug fix)
We hear ok, thanks for mentioning it or known problem but have
absolutely no idea the extent of the need, where it will be prioritized
We wait a long while, give up, and develop a work around
Two weeks after our success the company releases the feature/fix

Essentially, I'd like to know the extent of the issue.  If 90% of their
customers have reported/requested I would like to know this.  To avoid
doing the devlopment locally to replicate something that will be coming
(soon?).

Many times the local devlopment makes life possible for a year or two,
so it isn't fruitless.  It's only when it turns out the whole world has
been screaming, but the company doesn't want to acknowledge where it is
on the list o'priorities.

Maybe:

- I have a right to access a prioritized list of what the developers are 
working toward.  (except more elegantly phrased)

Tim





Roy Tennant wrote:


I have a presentation coming up and I'm considering doing what I'm calling a
Library Software Manifesto. Some of the following may not be completely
understandable on the face of it, and I would be explaining the meaning
during the presentation, but this is what I have so far and I'd be
interested in other ideas this group has or comments on this. Thanks,
Roy

Consumer Rights

- I have a right to use what I buy
- I have a right to the API if I've bought the product
- I have a right to accurate, complete documentation
- I have a right to my data
- I have a right to not have simple things needlessly complicated

Consumer Responsibilities

- I have a responsibility to communicate my needs clearly and specifically
- I have a responsibility to report reproducible bugs in a way as to
facilitate reproducing it
- I have a responsibility to report irreproducible bugs with as much detail
as I can provide
- I have a responsibility to request new features responsibly
- I have a responsibility to view any adjustments to default settings
critically




Re: [CODE4LIB] OpenContent SRU search of OAISter, weirdness?

2007-10-25 Thread Tim Shearer

Dumb question, no experience with the syntax, but should there be a
wildcard -or- use of something other than equals (sorry, if I'm way off
base, but most query syntax I use requires like or wildcarding).

-t

On Thu, 25 Oct 2007, Jonathan Rochkind wrote:


That was a typo in my problem report, I'm afraid. I was actually
searching Jansen, and that still exhibits the problems I mentioned.

I've also moved this conversation to indexdata's own list for this
service, at

http://lists.indexdata.dk/cgi-bin/mailman/listinfo/oclist (thanks to Jason
Ronallo for bringing that list to my attention).

Jonathan


Joshua Santelli wrote:

You're not getting any hits because the name is not Jensen, it's Jansen.
I'm not sure where Jensen came from but the OAIster indexes here have
Jansen.

josh


On 10/24/07 6:04 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote:



I'm messing with SRU search of http://indexdata.dk/opencontent/oaister

I have some behavior I can't explain. There's this article that is in
OAISter, called Resurrection and Appropriation: Reputational
Trajectories, Memory Work, and the Political Use of Historical Figures
by Robert S. Jensen.

I do an SRU search with query:
dc.title = Resurrection and Appropriation: Reputational Trajectories,
Memory Work, and the Political Use of Historical Figures
And I find the record, one hit. Good. You too could try, and see what
the DC returned looks like. It does have a dc:creator of Robert S. Jensen.

But I try a search that includes the author.
dc.title = Resurrection and Appropriation: Reputational Trajectories,
Memory Work, and the Political Use of Historical Figures and dc.creator
= Jensen

0 hits.
and cql.serverChoice = Jensen= 0 hits

Same using full name Robert S. Jensen (just as it appears in the
record), with cql.serverChoice or dc.creator.

Is this just a bad index, or is something else going on, or what?  As I
try sample searches on title and author, I keep running into false
negatives for things that ought to be in the OAISter index. Sometimes I
can figure out why (title not quite right; title has curly quotes, index
does not, etc.), but in this case I have no idea. But the net result is
it's hard to actually find your known item this way, via an automated
search on known item metadata.

Jonathan


--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu






--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu



[CODE4LIB] library find and bibliographic citation export?

2007-09-27 Thread Tim Shearer

Hi,

I'm interested to know if anyone working with LibraryFind has begun work
to create a tool for bibliographic export to citation management tools
like refworks, etc.

Thanks!
Tim

+++
Tim Shearer

Web Development Coordinator
The University Library
University of North Carolina at Chapel Hill
[EMAIL PROTECTED]
919-962-1288
+++