[CODE4LIB] NISO Open Discovery Initiative - survey invitation

2012-09-12 Thread Peter Noerr
(with apologies for cross-posting)
The Open Discovery Initiative (ODI), a working group of the National 
Information Standards Organization (NISO), has been formed to develop a 
Recommended Practice related to the index-based discovery services for 
libraries. ODI aims to investigate and improve the ecosystem surrounding these 
discovery services, with a goal of broader participation of content providers 
and increased transparency to libraries.

An important component of our work involves gathering information from the key 
stakeholders: libraries, content providers, and developers of discovery 
products.

If you are involved in discovery services we request that you respond to our 
survey. The survey results will provide essential information to the workgroup 
as it develops recommended practices related to discovery services. A full 
report on the findings of this survey will be made available publically on the 
NISO website later this year.

We are especially interested in input from:

•libraries that have implemented or plan to implement a discovery service 
and
•organizations that potentially contribute content to one or more of these 
services:
•
o   primary publishers,
o   producers of aggregated databases of citation or full-text content for 
libraries, and
o   creators of abstracting and indexing services.

We anticipate that the survey will take about 20 minutes to complete. 
https://www.surveymonkey.com/s/QBXZXSB

All respondents that identify themselves will be entered into a drawing for one 
of six $25 Amazon e-gift cards, kindly sponsored by Ex Libris and SAGE. These 
respondents will also receive a copy of the aggregated results. Note that any 
results shared will be anonymous and only aggregate data will be released.

In addition, if you are interested in keeping up to date with ODI, please sign 
up to our Interest mailing list - http://www.niso.org/lists/opendiscovery

Thank you
ODI Working Group



Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-27 Thread Peter Noerr
I agree entirely that these would need to be a collection of triples with its 
own set of attributes/metadata describing the collection. Basically a "record" 
with triples as the data elements.

But I see a bigger problem with the direction this thread has taken so far. The 
use of versions has been conditioned by the use of something like Github as the 
underlying "versioning platform". But Github (and all software versioning 
systems) are based on temporal versions, where each version is, in some way, an 
evolved unit of the same underlying thing - a program or whatever. So the 
versions are really temporally linearly related to each other as well as 
related in terms of added or improved or fixed functionality. Yes, the codebase 
(the underlying "thing") can fork or split in a number of ways, but they are 
all versions of the same thing, progressing through time.

In the existing bibliographic case we have many records which purport to be 
about the same thing, but contain different data values for the same elements. 
And these are the "the versions" we have to deal with, and eventually 
reconcile. They are not descendents of the same original, they are independent 
entities, whether they are recorded as singular MARC records or collections of 
LD triples. I would suggest that at all levels, from the triplet or key/value 
field pair to the triple collection or fielded record, what we have are 
"alternates", not "versions". 
 
Thus the alternates exist at the triple level, and also at the "collection" 
level (the normal bibliographic unit record we are familiar with). And those 
alternates could then be allowed versions which are the attempts to, in some 
way, improve the quality (your definition of what this is is as good as mine) 
over time. And with a closed group of alternates (of a single bib unit) these 
versioned alternates would (in a perfect world) iterate to a common descendent 
which had the same agreed, authorized set of triples. Of course this would only 
be the "authorized form" for those organizations which recognized the 
arrangement. 

But, allowing alternates and their versions does allow for a method of tracking 
the original problem of three organizations each copying each other endlessly 
to "correct" their data. In this model it would be an alternate/version spiral 
of states, rather than a flat circle of each changing version with no history, 
and no idea of which was master. (Try re-reading Stuart's "(a), (b), (c)" below 
with the idea of alternates as well as versions (of the Datasets). I think it 
would become clearer as to what was happening.) There is still no master, but 
at least the state changes can be properly tracked and checked by software 
(and/or humans) so the endless cycle can be addressed - probably by an outside 
(human) decision about the "correct" form of a triple to use for this bib 
entity.

Or this may all prove to be an unnecessary complication.

Peter


> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> stuart yeates
> Sent: Monday, August 27, 2012 3:42 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google
> 
> These have to be named graphs, or at least collections of triples which can 
> be processed through
> workflows as a single unit.
> 
> In terms of LD there version needs to be defined in terms of:
> 
> (a) synchronisation with the non-bibliographic real world (i.e. Dataset Z 
> version X was released at
> time Y)
> 
> (b) correction/augmentation of other datasets (i.e Dataset F version G 
> contains triples augmenting
> Dataset H versions A, B, C and D)
> 
> (c) mapping between datasets (i.e. Dataset I contains triples mapping between 
> Dataset J version K and
> Dataset L version M (and visa-versa))
> 
> Note that a 'Dataset' here could be a bibliographic dataset (records of 
> works, etc), a classification
> dataset (a version of the Dewey Decimal Scheme, a version of the Māori 
> Subject Headings, a version of
> Dublin Core Scheme, etc), a dataset of real-world entities to do authority 
> control against (a dbpedia
> dump, an organisational structure in an institution, etc), or some arbitrary 
> mapping between some
> arbitrary combination of these.
> 
> Most of these are going to be managed and generated using current systems 
> with processes that involve
> periodic dumps (or drops) of data (the dbpedia drops of wikipedia data are a 
> good model here). git
> makes little sense for this kind of data.
> 
> github is most likely to be useful for smaller niche collaborative 
> collections (probably no more than
> a million triples) mapping between the larger collections, and scripts for 
> integrating the collections
> into a sane whole.
> 
> cheers
> stuart
> 
> On 28/08/12 08:36, Karen Coyle wrote:
> > Ed, Corey -
> >
> > I also assumed that Ed wasn't suggesting that we literally use github
> > as our platform, but I do want to remind folks how far w

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-19 Thread Peter Noerr
On the matter of input of characters, the following email from the Unicode list 
may be of interest to those working through or developing a web UI. Note that 
Key Curry is a work-in-progress, and has received a fair bit of "it doesn't 
have" comment on the Unicode list. But it is a good basis.

Peter
--
From:  unicode-bou...@unicode.org on behalf of Ed Trager [ed.tra...@gmail.com]
Sent:   Tuesday, April 17, 2012 2:41 PM
To: Unicode Mailing List
Subject:Key Curry : Attempting to make it easy to type world languages 
and orthographies on the web

A long time in the making, I am finally making "Key Curry" public!

"Key Curry" is a web application and set of web components that allows
one to easily type many world languages and specialized orthographies
on the web. Please check it out and provide me feedback:

http://unifont.org/keycurry/ 

In addition to supporting major world languages and orthographies, I
hope that "Key Curry" makes it easy for language advocates and web
developers to provide support for the orthographies of minority
languages -- many of which are not currently supported (or are only
poorly supported) by the major operating system vendors.

Under the hood, the software uses a javascript user interface
framework that I wrote called "Gladiator Components" along with the
popular "jQuery" javascript library as a foundation. I have used HTML
5 technologies such as localStorage to implement certain features.

Currently, Key Curry appears to work well in the latest versions of
Google Chrome, Firefox, and Safari on devices with standard QWERTY
keyboards (e.g. laptops, desktop computers, netbooks, etc.). Recent
versions of Opera and Internet Explorer version 9 appear to have bugs
which limit the ability of Key Curry to operate as designed. The app
is not likely to work well on older versions of any browser. I have
not yet tested IE 10 on Windows 8.

Although Key Curry appears to load flawlessly on the very few Android
and Apple iOS tablet and/or mobile devices that I have "dabbled" with,
the virtual keyboards on those devices are very different from
physical keyboards and I have not yet investigated that problem area
at all - so don't expect it to work on your iPad or other mobile
device.

Constructive criticism and feedback is most welcome. I have many
additional plans for Key Curry "in the works" - but I'll leave further
commentary to another day!

- Ed
-


> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Robert Haschart
> Sent: Thursday, April 19, 2012 2:23 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 
> and MARC21
> 
> On 4/18/2012 12:08 PM, Jonathan Rochkind wrote:
> > On 4/18/2012 11:09 AM, Doran, Michael D wrote:
> >> I don't believe that is the case.  Take UTF-8 out of the picture, and
> >> consider the MARC-8 character set with its escape sequences and
> >> combining characters.  A character such as an "n" with a tilde would
> >> consist of two bytes.  The Greek small letter alpha, if invoked in
> >> accordance with ANSI X3.41, would consist of five bytes (two bytes
> >> for the initial escape sequence, a byte for the character, and then
> >> two bytes for the escape sequence returning to the default character
> >> set).
> >
> > ISO 2709 doesn't care how many bytes your characters are. The
> > directory and offsets and other things count bytes, not characters.
> > (which was, in my opinion, the _right_ decision, for once with marc!)
> >
> > How bytes translate into characters is not a concern of ISO 2709.
> >
> > The majority of non-7-bit-ASCII encodings will have chars that are
> > more than one byte, either sometimes or always. This is true of MARC8
> > (some chars), UTF8 (some chars), and UTF16 (all chars), all of them.
> > (It is not true of Latin-1 though, for instance, I don't think).
> >
> > ISO 2709 doesn't care what char encodings you use, and there's no
> > standard ISO 2709 way to determine what char encodings are used for
> > _data_ in the MARC record. ISO 2709 does say that _structural_
> > elements like field names, subfield names, the directory itself,
> > seperator chars, etc, all need to be (essentially, over-simplifying)
> > 7-bit-ASCII. The actual data itself is application dependent, 2709
> > doesn't care, and 2709 doesn't give any standard cross-2709 way to
> > determine it.
> >
> > That is my conclusion at the moment, helped by all of you all in this
> > thread, thanks!
> 
> The conclusion that I came to in the work I have done on marc4j (which is 
> used heavily by SolrMarc)
> is that for any significant processing of Marc records the only solution that 
> makes sense is to
> translate the record data into Unicode characters as it is being read in.  Of 
> course as you and others
> have stated, determining what the data actually is, in order to correctly 
> transla

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Peter Noerr
We cried our eyes out in 1976 when this first came to our attention at the BL. 
Even more crying when we couldn't get rid of it in the MARC-I to MARC-II 
conversion (well before MARC21 was even a twinkle) - a lot of tears are 
gathering somewhere.

Peter



> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill 
> Dueber
> Sent: Tuesday, April 17, 2012 5:50 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 
> and MARC21
> 
> On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero  wrote:
> 
> > Actually Anglo and Francophone centric. And the USMARC style 245 was a
> > poor replacement for the UKMARC approach (someone at the British
> > Library hosted Linked Data meeting wondered why there were punctation
> > characters included in the data in the title field. The catalogers wept 
> > slightly).
> >
> > Simon
> >
> 
> 
> Slightly? I cry my eyes out *every single day* about that. Well, every 
> weekday, anyway.
> 
> 
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library


Re: [CODE4LIB] presenting merged records?

2012-04-03 Thread Peter Noerr
 
contradictory, or orthogonal to each other. Language codes are a good example 
of chaos.
In fact language codes are a type (D) operation to choose one of the many data 
element variants (usually the most frequent) and make it the head, with the 
others relegated in some typographic (or hidden list) way.
There are also specialist treatments such as ranging mentioned above. Useful 
for dates (how to deal with "c"?) and not much else. Even so dates can have 
their own idiosyncrasies if dealing with articles, not monographs; how to 
handle some with dd's and some with mm"s and some with only years, or some 
combination. A "vote" for the most frequent would seem to give the best chance 
of a full publication date, but can go seriously wrong  - especially when 
confronted with dd/mm/yy and mm/dd/yy from different aggregators.
Other specialist processing deals with individual data elements such as journal 
titles and, further afield, phone numbers. Many of these processes can be used 
for virtual records for display, but generally are reserved for more thorough 
processing to normalize and sanitize the data in a later stage, which can 
involve recourse to outside authority tables, and sources, and a lot of other 
stuff.

And really it is for reasons of doing a lot of work, slowing down general 
processing and display, and producing sometimes "iffy" virtual records, that we 
have adopted the simplest display of choosing just one as an exemplar, hiding 
the rest in a sub-list to keep the display clean, and allowing the user to see 
for him/herself and make decisions. Having said that, we do display a count, 
and often a list of the sources of the records as part of the head, and leave 
it at that. Pragmatism beats perfection.

Peter


> > lot of configuration options, which makes it quite flexible.
> >
> > So, you can choose:
> >
> > 1. Which fields need to be identical for a record to be merged at all.
> > I was using author, title, edition when I first mailed the list, but
> > have found allowing records with different publication dates to be
> > merged just caused too many unpredictable problems, and have now added
> > publication date to the list of required fields. The test for
> > identical authors, dates etc is just a string comparison, so a
> > proportion of records which ought to be merged by these criteria never
> > are, due to typos, variant names, date formats, etc.
> >
> > 2. What to do with fields which differ between records which are being
> > merged. You can choose either 'unique', which appends all unique field
> > values (this is what I use for subject headings, so exactly repeated
> > subject headings are dropped, but variants are kept), and 'longest',
> > which picks the longest field value from all the candidates (this is
> > what I use for abstracts).
> >
> > At the end of the process you have a merged record which has a 'head'
> > with the merged record itself, but which contains each of the original
> > records, so you could potentially do as you suggest and let users see
> > any of the input records if they wanted. However, by default this
> > isn't Marc but an internal format (a processed subset of the Marc
> > input) so it may not be much use to most users.
> >
> > I'm finding the 'head' section is mostly quite usable but does often
> > have individual fields with strange or repeated values (eg values
> > identical apart from punctuation). So I'm doing some post-processing
> > of my own on this, but it's very arbitrary at the moment.
> >
> > Graham
> >
> > On 03/30/12 01:09, Peter Noerr wrote:
> >> Hi Graham,
> >>
> >> What we do in our federated search system, and have been doing for some 
> >> few years, is basically
> give the "designer" a choice of what options the user gets for "de-duped" 
> records.
> >>
> >> Firstly de-duping can be of a number of levels of sophistication, and a 
> >> many of them lead to the
> situation you have - records which are "similar" rather than identical. On 
> the web search side of
> things there are a surprising number of real duplicates (well maybe not 
> surprising if you study more
> than one page of web search engine results), and on Twitter the duplicates 
> well outnumber the original
> posts (many thanks 're-tweet').
> >>
> >> Where we get duplicate records the usual options are: 1) keep the first 
> >> and just drop all the rest.
> 2) keep the largest (assumed to have the most information) and drop the rest. 
> These work well for WSE
> results where they are all almost iden

Re: [CODE4LIB] presenting merged records?

2012-03-29 Thread Peter Noerr
Hi Graham,

What we do in our federated search system, and have been doing for some few 
years, is basically give the "designer" a choice of what options the user gets 
for "de-duped" records.

Firstly de-duping can be of a number of levels of sophistication, and a many of 
them lead to the situation you have - records which are "similar" rather than 
identical. On the web search side of things there are a surprising number of 
real duplicates (well maybe not surprising if you study more than one page of 
web search engine results), and on Twitter the duplicates well outnumber the 
original posts (many thanks 're-tweet').

Where we get duplicate records the usual options are: 1) keep the first and 
just drop all the rest. 2) keep the largest (assumed to have the most 
information) and drop the rest. These work well for WSE results where they are 
all almost identical (the differences often are just in the advertising 
attached to the pages and the results), but not for bibliographic records.

Less draconian is 3) Mark all the duplicates and keep them in the list (so you 
get 1, 2, 3, 4, 5, 5.1, 5.2, 5.3, 6, ...). This groups all the similar records 
together under the sort key of the first one, and does enable the user to 
easily skip them.

More user friendly is 4) Mark all duplicates and hide them in a sub-list 
attached to the "head" record. This gets them out of the main display, but 
allows the user who is interested in that "record" to expand the list and see 
the variants. This could be of use to you.

After that we planned to do what you are proposing and actually merge record 
content into a single virtual record, and worked on algorithms to do it. But 
nobody was interested. All our partners (who provide systems to lots of 
libraries, both public, academic, and special) decided that it would confuse 
their users more than it would help. I have my doubts, but they spoke and we 
put the development on ice.

I'm not sure this will help, but it has stood the test of time, and is well 
used in its various guises. Since no-one else seems interested in this topic, 
you could email me off list and we could discuss what we worked through in the 
way of algorithms, etc.

Peter


> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of graham
> Sent: Wednesday, March 28, 2012 8:05 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] presenting merged records?
> 
> Hi Michael
> 
> On 03/27/12 11:50, Michael Hopwood wrote:
> > Hi Graham, do I know you from RHUL?
> >
> Yes indeed :-)
> 
> > My thoughts on "merged records" would be:
> >
> > 1. don't do it - use separate IDs and just present links between related 
> > manifestations; thus
> avoiding potential confusions.
> 
> In my case, I can't avoid it as it's a specific requirement: I'm doing a 
> federated search across a
> large number of libraries, and if closely similar items aren't merged, the 
> results become excessively
> large and repetitive. I'm merging all the similar items, displaying a summary 
> of the merged
> bibliographic data, and providing links to each of the libraries with a copy. 
>  So it's not really
> FRBRization in the normal sense, I just thought that FRBRization would lead 
> to similar problems, so
> that there might be some well-known discussion of the issues around... The 
> merger of the records does
> have advantages, especially if some libraries have very underpopulated 
> records (eg subject fields).
> 
> Cheers
> Graham
> 
> >
> > http://www.bic.org.uk/files/pdfs/identification-digibook.pdf
> >
> > possible relationships - see 
> > http://www.editeur.org/ONIX/book/codelists/current.html - lists 51
> (manifestation)and 164 (work).
> >
> > 2. c.f. the way Amazon displays rough and ready categories (paperback,
> > hardback, audiobooks, *ahem* ebooks of some sort...)
> >
> > On dissection and reconstitution of records - there is a lot of talk going 
> > on about RDFizing MaRC
> records and re-using in various ways, e.g.:
> >
> > http://www.slideshare.net/JenniferBowen/moving-library-metadata-toward
> > -linked-data-opportunities-provided-by-the-extensible-catalog
> >
> > Cheers,
> >
> > Michael
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > Of graham
> > Sent: 27 March 2012 11:06
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: [CODE4LIB] presenting merged records?
> >
> > Hi
> >
> > There seems to be a general trend to presenting merged records to users, as 
> > part of the move towards
> FRBRization. If records need merging this generally means they weren't 
> totally identical to start with,
> so you can end up with conflicting bibliographic data to display.
> >
> > Two examples I've come across with this: Summon can merge
> > print/electronic versions of texts, so uses a new 'merged' material
> > type of 'book/ebook' (it doesn't yet seem to have all the other
> > possible permutations, eg book/audiobook). Pazpa

Re: [CODE4LIB] NON-MARC ILS?

2012-03-14 Thread Peter Noerr
There was a system developed back in the '80s which stored its records 
internally in a direct Entity-Relationship database and allowed inter-record 
linking and a rather hyperlink-like data structure. BUT... that was all 
internal. It allowed some very nice OPAC features and possibly easier 
cataloguing as authorities (basically subjects and authors) could be created 
once and linked into a bib record. Externally the system exchanged records in 
MARC. In fact in at least 15 different flavors of MARC. (It was built in Europe 
and was used to provide a service as a transformer for converting USMARC to 
various other MARC for European distribution.)

MARC was, and is an interchange format, so it is the format used to ship bib 
records between ILSs. It doesn't have to be used internally as the above system 
(which sold over 3,000 copies and has about 1,000 still active today, although 
it has been off the market for over 13 years) and InMagic and others show. In 
fact almost all the commercial systems do, as someone said previously, store 
the MARC records, not in ISO 2709 format, but shred them into some relational 
structure of tuples. But MARC is the language they all speak to each other. To 
change that would need an infrastructure, as also mentioned previously in this 
thread, to allow existing ILSs and repositories, based on MARC exchange, to 
interoperate with new ILSs, based on some other exchange. And that does mean 
hubs and repositories of transforming capabilities with very sophisticated 
semantics - and there really isn't any commercial case to create them. 

And all of this is a long way from what Matt's actual question was. 

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Bigwood, David
> Sent: Wednesday, March 14, 2012 12:49 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [*SP* 22%] Re: [CODE4LIB] NON-MARC ILS?
> 
> Yes, there are non-MARC systems out there. I think InMagic has some.
> LibraryThing could be used and doesn't require MARC.  There are some home 
> inventory programs that
> might do for a small church library or such.
> 
> But what is the problem with MARC? The structure is fairly compact, compared 
> to XLM for instance. It
> does lack some granularity I'd like to see, but that would only make it more 
> complex if flexible. It
> would also be nice if it were possible to do more linking from the record. 
> But this only increases the
> complexity and makes it more difficult to local catalogers. Personally, I 
> kind of like MODS, but I'm
> not sure how much it would save.
> 
> Is the problem with the rules on how to fill the MARC record? That has mostly 
> to do with AACR. The
> bibliographic universe is complex and getting more so. The rules for 
> description and access must take
> that into account. It is true that the small public library won't need the 
> same detail as a special
> collection or research university. Maybe there could be a simplified/stripped 
> down AACR? Or maybe RDA,
> the new standard will have that basic option?
> 
> Or is you problem with the fields, their order and associated punctuation? 
> That is ISBD or FRBR. Both
> are based on common sense and what we experience as the necessary elements 
> from our work. They are not
> based on research on what the user wants and does. However, that gets to the 
> question "Who is the
> user?" The elementary child writing a report on the Civil War or a grad 
> student writing their
> dissertation, the mechanic looking for a wiring diagram for a 69 Ford, or a 
> birdwatcher planning their
> trip, the person looking for do your own divorce? Maybe Google searches could 
> provide some answers but
> do people look for different things and search differently in the library and 
> on-line? Fertile ground
> for some theses.
> 
> The other thing to consider is the huge number of records available in MARC 
> format. A small public
> library probably has very little original cataloging to do. Local high school 
> yearbooks, some self-
> published family histories. Doing things differently locally would mean all 
> the common stuff would
> have to be done in-house, not just down loaded.
> 
> Sincerely,
> David Bigwood
> dbigw...@gmail.com
> Lunar and Planetary Institute
> Catalogablog: http://catalogablog.blogspot.com
> 
> On Mar 14, 2012, at 8:59 AM, Matt Amory wrote:
> 
> > Is there a full-featured ILS that is not based on MARC records?
> > I know we love complexity, but it seems to me that my public library
> > and its library network and maybe even every public library could
> > probably do without 95% of MARC Fields and encoding, streamline
> > workflows and save $ if there were a simpler standard.
> > Is this what an Endeca-based system is about, or do those rare birds
> > also use MARC in the background?
> > Forgive me if the question has been hashed and rehashed over the
> years...
> >
> > --
> > Matt Amory
> > (917) 771-4157
> > matt.am...@gmail.com
> 

Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling

2012-02-25 Thread Peter Noerr
This post veers nearer to something I was going to add as an FYI, so here 
goes...

FYI: NISO has recently started a working group to study best practices for 
discovery services. The ODI (=Open Discovery Initiative) working group is 
hoping to look at exactly this issue (how should a content provider tell a 
content requestor what it can "have") among others (how to convey commercial 
restrictions, how to produce statistics meaningful to providers, discovery 
services, and consumers of the discovery service), and hopefully produce 
guidelines on procedures and formats, etc. for this. 

This is a new working group and its timescale doesn't expect any deliverables 
until Q3 of 2012, so it is a bit late to help Owen, but anyone who is 
interested in this may want to follow, from time to time, the NISO progress. 
Look at www.niso.org and find the ODI working group. If you're really 
interested contact the group to offer thoughts. And many of you may be 
contacted by a survey to find out your thoughts as part of the process, anyway. 
Just like the long reach of OCLC, there is no escaping NISO.

Peter   

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
> Hourcle
> Sent: Friday, February 24, 2012 10:20 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling
> 
> On Feb 24, 2012, at 9:25 AM, Kyle Banerjee wrote:
> 
> >>
> >> One of the questions this raises is what we are/aren't allowed to do
> >> in terms of harvesting full-text. While I realise we could get into
> >> legal stuff here, at the moment we want to put that question to one
> >> side. Instead we want to consider what Google, and other search
> >> engines, do, the mechanisms available to control this, and what we
> >> do, and the equivalent mechanisms - our starting point is that we
> >> don't feel we should be at a disadvantage to a web search engine in
> >> our harvesting and use of repository records.
> >>
> >> Of course, Google and other crawlers can crawl the bits of the
> >> repository that are on the open web, and 'good' crawlers will obey
> >> the contents of robots.txt We use OAI-PMH, and while we often see
> >> (usually general and sometimes
> >> contradictory) statements about what we can/can't do with the
> >> contents of a repository (or a specific record), it feels like there
> >> isn't a nice simple mechanism for a repository to say "don't harvest this 
> >> bit".
> >>
> >
> > I would argue there is -- the whole point of OAI-PMH is to make stuff
> > available for harvesting. If someone goes to the trouble of making
> > things available via a protocol that exists only to make things
> > harvestable and then doesn't want it harvested, you can dismiss them
> > as being totally mental.
> 
> I see it like the people who request that their pages not be cached elsewhere 
> -- they want to make
> their object 'discoverable', but they want to control the access to those 
> objects -- so it's one thing
> for a search engine to get a copy, but they don't want that search engine 
> being an agent to distribute
> copies to others.
> 
> Eg, all of the journal publishers who charge access fees -- they want people 
> to find that they have a
> copy of that article that you're interested in ... but they want to collect 
> their $35 for you to read
> it.
> 
> In the case of scientific data, the problem is that to make stuff 
> discoverable, we often have to
> perform some lossy transformation to fit some metadata standard, and those 
> standards rarely have
> mechanisms for describing error (accuracy, precision, etc.).  You can do some 
> science with the catalog
> records, but it's going to introduce some bias into your results, so you're 
> typically better of
> getting the data from the archive.  (and sometimes, they have nice clean 
> catalogs in FITS, VOTable,
> CDF, NetCDF, HDF or whatever their discipline's preferred data format is)
> 
> ...
> 
> Also, I don't know if things have changed in the last year, but I seem to 
> remember someone mentioning
> at last year's RDAP (Research Data Access & Preservation) summit that Google 
> had coordinated with some
> libraries for feeds from their catalogs, but was only interested in books, 
> not other objects.
> 
> I don't know how other search engines might use data from OAI-PMH, or if 
> they'd filter it because they
> didn't consider it to be information they cared about.
> 
> -Joe


Re: [CODE4LIB] Obvious answer to registration limitations

2012-01-09 Thread Peter Noerr
One possibility for this is the InDiCo conference management system produced by 
CERN which was used for the 2005 ELAG meeting. It has been used by hundreds of 
other conferences and workshops since then. It is open source and written in 
Python. So it could be adapted/extended to specific C4L needs - like time 
diverse registration and the like. See what conferences it is managing at 
http://indico.cern.ch. Software details at http://indico-software.org. It is 
possible CERN would allow C4L management on its servers. If not, and it moved 
host from year to year, at least the functionality and features would be the 
same and could be incrementally enhanced as C4L requirements evolved, and there 
would be some (lingering) expertise in its use.

Peter 

> 
> On Tue, Jan 3, 2012 at 7:08 PM, David Friggens  wrote:
> 
> 
> I think the thing that would move these ideas along is for someone to write 
> the registration system
> that we're talking about (or find one that does what we want that we could 
> repurpose).  In my humble
> opinion, ideas that require more manual work on the part of the
> host(s) are less likely to happen; but, if there was a system that would do 
> what we want (and handle
> the crush of registration), I think the community would happily jump behind 
> it -- registration has
> always been an issue.
> 
> So, that said, I'll take one step backword and let someone else step forward 
> (by standing still) to
> volunteer to write it... as they say, "running code wins."
> 
> Kevin


Re: [CODE4LIB] Obvious answer to registration limitations

2011-12-22 Thread Peter Noerr
+1

Peter Noerr
MuseGlobal

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Schneider
> Sent: Thursday, December 22, 2011 11:11 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Obvious answer to registration limitations
> 
> >
> > Also, is there any interest in a San Francisco Bay Area Code For
> > Libraries Regional Affiliate (code4lib-sfbay for short)?
> >
> 
> +1
> 
> If our bandwidth issues on campus get resolved, we'd offer our site, too.
> Our Valley Center for Performing Arts has a smaller theater on the lower 
> level that could work.
> Exploratory site visits welcome.
> 
> Karen G. Schneider
> Holy Names University


Re: [CODE4LIB] My crazed idea about dealing with registration limitations

2011-12-22 Thread Peter Noerr
Crazy variation number 3. Have two tracks which are identical, but time shifted 
by half a day (or some other convenient unit). The presenters talk twice on the 
same day - in the morning for track A and the afternoon for track B. That way 
there is no "speaker gulag", no time over-run (though, following Declan's 
point, how much time is left out of the week after travelling, so why not the 
whole week), and you get a chance to hear a really interesting presentation 
twice - or miss it twice! Yes the interactions would be different (I would hope 
so), but that may be an advantage. Questions can be asked that got the time 
chop previously, more details can be added the second time round, attendees 
have more to compare over lunch/beer. The problem would be a heard following 
one presentation so we have 500 in one and only 3 in the other. Room size 
limits (enforced) could help relieve that, or labeling people to their track 
and only allowing/encouraging mixing at "intermediate events".

And streaming to a "satellite" meeting, say here in the Bay, area where 
10-15-20 people could get together informally gives them a chance to interact 
amongst themselves, if not the whole group. (OK, that is crazy idea #4

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Corey 
> A Harper
> Sent: Thursday, December 22, 2011 8:44 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] My crazed idea about dealing with registration 
> limitations
> 
> Cary,
> 
> Good to know about your extensive experience w/ streaming.
> 
> If you'll be in Seattle, would you be willing to add your name to the "Video 
> Committee" listing?
> http://wiki.code4lib.org/index.php/2012_committees_sign-up_page#Video_Committee
> 
> Having people who actually know what they're doing involved in this effort 
> *this* year will help
> ensure that we're actually able to pull it off as effectively as IU did...
> 
> Thanks,
> -Corey
> 
> 
> On Thu, Dec 22, 2011 at 10:42 AM, Cary Gordon  wrote:
> > This is definitely doable, and potentially effective for a single
> > track conference.
> >
> > I have been doing streaming as a volunteer for eight years and it
> > keeps getting easier.
> >
> > Cary
> >
> > On Thu, Dec 22, 2011 at 7:33 AM, Wilfred Drew  wrote:
> >> Here is another crazy idea; stream the event live for those who can't get 
> >> registered for the pace
> to face version and provide a lower registration fee for them.
> >>
> >>
> >> -
> >> Wilfred (Bill) Drew, M.S., B.S., A.S.
> >> Assistant Professor
> >> Librarian, Systems and Tech Services/Electronic Resources/Serials
> >> Tompkins Cortland Community College  (TC3) Library:
> >> http://www.tc3.edu/library/
> >> Dryden, N.Y. 13053-0139
> >> Follow the library: http://twitter.com/TC3Library
> >> E-mail: dr...@tc3.edu
> >> Phone: 607-844-8222 ext.4406
> >> SKYPE/Twitter:BillDrew4
> >> SMS/TXT Me: 6072182217
> >> Website: http://BillTheLibrarian.com
> >> StrengthsQuest Strengths: Ideation, Input, Learner, Command,
> >> Analytical http://www.facebook.com/billdrew "One thing about eBooks
> >> that most people haven't thought much is that eBooks are the very
> >> first thing that we're all able to have as much as we want other than 
> >> air." -- Michael Hart,
> Project Gutenberg PPlease consider the environment before printing this 
> e-mail or document.
> >
> >
> >
> > --
> > Cary Gordon
> > The Cherry Hill Company
> > http://chillco.com
> 
> 
> 
> --
> Corey A Harper
> Metadata Services Librarian
> New York University Libraries
> 20 Cooper Square, 3rd Floor
> New York, NY 10003-7112
> 212.998.2479
> corey.har...@nyu.edu


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Richard Wallis
> Sent: Tuesday, December 13, 2011 3:16 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> On 13 December 2011 22:17, Peter Noerr  wrote:
> 
> > I agree with Karen below that a record seems more bounded and static,
> > whereas a description varies according to need. And that is the
> > distinction I was trying to get at: that the item stored in some
> > database is everything unique about that entity - and is static, until
> > some data actually changes, whereas the description is built at run
> > time for the user and may contain some data from the item record, and
> > some aggregated from other, linked, item records. The records all have
> > long term existence in databases and the like, whereas the description
> > is a view of all that stored data appropriate for the moment. It will
> > only be stored as a processing intermediate result (as a record, since
> > its contents are now fixed), and not long term, since it would be
> > broken up to bits of entity data and stored in a distributed linked
> > fashion (much like, as I understand it, the BL did when reading MARC
> > records and storing them as entity updates.)
> >
> 
> Yes.  However those descriptions have the potential to be as permanent as the 
> records that they were
> derived from.  As in the BL's case where the RDF is stored, published and 
> queried in [Talis]
> Kasabi.com:
> http://kasabi.com/dataset/british-national-bibliography-bnb
>

I would argue that they are stored permanently as multiple records holding the 
data about each of the individual entities derived from the original single 
MARC record. In my mind (for this discussion) anything that is stored is a 
record. It may be a single agglutinative record such as MARC, or the same data 
may be split amongst records for the work, the author, the subjects, the 
physical instance, the referenced people, etc. But the data for each of those 
is stored in a record unique to that entity (or in records for other entities 
linked to that entity), so the whole data set of attributes get spread around 
as fields in various records about various entities - and the links between 
them, let us not forget the very real importance of the links for carrying 
data. 

When a user wants to view the information about this title, then a description 
is assembled from all the stored records and presented to the user. It is, 
almost by definition (as I am viewing this), an ephemeral view (a virtual 
record - one which is not stored complete anywhere) for this user. If the user 
stores this record in a store using the same mechanisms and data model, then 
the constituent data values will be dispersed to their entity records again. 
(If the user wants to process the record, then it may well be stored as a 
whole, since it contains all the information needed for whatever the current 
task is, and the processed record may be discarded or stored permanently again 
in a linked data net as data values in various entity records within that 
model. Or it may be stored whole in an old fashioned "record" oriented 
database.)

 
> 
> >
> > Having said all that, I don't like the term "description" as it
> > carries a lot of baggage, as do all the other terms. But I'm stuck for 
> > another one.
> >
> 
> Me too.  I'm still searching searching for a budget airline term - no baggage!

How about something based on South West - where bags fly free! Though I can't 
make any sort of acronym starting with "SW"!
> 
> ~Richard.
> 
> --
> Richard Wallis
> Technology Evangelist, Talis
> http://consulting.talis.com
> Tel: +44 (0)7767 886 005
> 
> Linkedin: http://www.linkedin.com/in/richardwallis
> Skype: richard.wallis1
> Twitter: @rjw
> IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
Being no longer in Europe, I had completely missed the currently hot potato 
definition of EMU. But it had a nice feel to it 

I agree with Karen below that a record seems more bounded and static, whereas a 
description varies according to need. And that is the distinction I was trying 
to get at: that the item stored in some database is everything unique about 
that entity - and is static, until some data actually changes, whereas the 
description is built at run time for the user and may contain some data from 
the item record, and some aggregated from other, linked, item records. The 
records all have long term existence in databases and the like, whereas the 
description is a view of all that stored data appropriate for the moment. It 
will only be stored as a processing intermediate result (as a record, since its 
contents are now fixed), and not long term, since it would be broken up to bits 
of entity data and stored in a distributed linked fashion (much like, as I 
understand it, the BL did when reading MARC records and storing them as entity 
updates.)

Having said all that, I don't like the term "description" as it carries a lot 
of baggage, as do all the other terms. But I'm stuck for another one.

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Tuesday, December 13, 2011 12:23 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> Quoting Simon Spero :
> 
> > On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis
> > wrote:
> >
> >
> >> However, I think you are thinking in the right direction - I am
> >> resigning myself to just using the word 'description'.
> >
> >
> > Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
> > with records (where a record is a atomic asserted set of propositions about
> > a resource)?
> 
> Yes, I realize that you were asking Richard, but I'm a bit forward, as
> we know. I do NOT see a description as atomic in the sense that a
> record is atomic. A record has rigid walls, a description has
> permeable ones. A description always has the POTENTIAL to have a bit
> of unexpected data added; a record cuts off that possibility.
> 
> That said, I am curious about the permeability of the edges of a named
> graph. I don't know their degree of rigidity in terms of properties
> allowed.
> 
> kc
> 
> >
> > Simon
> >
> 
> 
> 
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Peter Noerr
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Sunday, December 11, 2011 3:47 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
> 
> Quoting Richard Wallis :
> 
> 
> > You get the impression that the BL "chose a subset of their current
> > bibliographic data to expose as LD" - it was kind of the other way around.
> > Having modeled the 'things' in the British National Bibliography
> > domain (plus those in related domain vocabularis such as VIAF, LCSH,
> > Geonames, Bio, etc.), they then looked at the information held in
> > their [Marc] bib records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of
> "translating X to RDF" projects go on. I begin to wonder what there is in 
> library data that is
> *unique*, and my conclusion is: not much. Books, people, places, topics: they 
> all exist independently
> of libraries, and libraries cannot take the credit for creating any of them. 
> So we should be able to
> say quite a bit about the resources in libraries using shared data points -- 
> and by that I mean, data
> points that are also used by others. So once you decide on a model (as BL 
> did), then it is a matter of
> looking *outward* for the data to re-use.

Trying to synthesize what Karen, Richard and Simon have bombarded us with here, 
leads me to conclude that linking to existing (or to be created) external data 
(ontologies and representations) is a matter of: being sure what you’re the 
system's current user's context is, and being able to modify the external data 
brought into the users virtual EMU(see below *** before reading further). I 
think Simon is right that "records" will increasingly become virtual in that 
they are composed as needed by this user for this purpose at this time. We 
already see this in practice in many uses from adding cover art to book MARC 
records to just adding summary information to a "management level" report. 
Being able to link from a "book" record to foaf:person and a bib:person records 
and extract data elements from each as they are needed right now should not be 
too difficult. As well as a knowledge of the current need, it requires a 
semantically based mapping of the different elements of those!
  "people" representations. The neat part is that the total representation for 
that person may be expressed through both foaf: and bib: facets from a single 
EMU which contains all things known about that person, and so our two requests 
for linked data may, in fact should, be mining the same resource, which will 
translate the data to the format we ask for each time, and then we will combine 
those representations back to a collapsed single data set.

I think Simon (maybe Richard, maybe all of you) was working towards a single 
unique EMU for the entity which holds all unique information about it for a 
number of different uses/scenarios/facets/formats. Of course deciding on what 
is unique and what is obtained from some more granular breakdown is another 
issue. (Some experience with this "onion skin" modeling lies deep in my past, 
and may need dredging up.)

It is also important, IMHO, to think about the repository from of entity data 
(the EMU) and the transmission form (the data sent to a requesting system when 
it asks for "foaf:person" data). They are different and have different 
requirements. If you are going to allow all these entity data elements to be 
viewed through a "format filter" then we have a mixed model, but basically a 
whole-part between the EMU and the transmission form. (e.g. the full data set 
contains the person's current address, but the transmitted response sends only 
the city). Argue amongst yourselves about whether an address is a separate 
entity and is linked to or not - it makes a simple example to consider it as 
part of the EMU.

All of this requires that we think of the web of data as being composed not of 
static entities with a description which is fixed at any snapshot in time, but 
being dynamic in that what two users see of the same entity maybe different at 
exactly the same instant. So not only a descriptive model structure, but also a 
set of semantic mappings, a context resolution transformation, and the system 
to implement it each time a link to related data is followed.

> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about
> quality thereof) and classification designations that libraries provide are 
> an added value, and we
> should do more to make them useful for discovery.
> 
> 
> >
> > I know it is only semantics (no pun intended), but we need to stop
> > using the word 'record' when talking about the future description of 
> > 'things' or
> > entities that are then linked together.   That word has so many built in
> 

Re: [CODE4LIB] Models of MARC in RDF

2011-12-05 Thread Peter Noerr
See historical comment in text below. But, to look forward -

It seems to me that we should be able to design a model with graceful 
degradation from full MARC data element set (vocabulary if you insist) to a 
core set which allows systems to fill in what they have and, on the receiving 
end, extract what they can find. Each system can work with its own schema, if 
it must, as long as the mapping for its level of detail against whatever 
designated level of detail it wishes to accept in the exchange format is 
created first. Obviously greater levels of detail cannot be inferred from 
lesser, and so many systems would be working with less than the data they would 
like, or create locally, but that is the nature of bibliographic data - it is 
never complete, or it must be processed assuming that is the case.

Using RDF and entity modeling it should be possible to devise a (small) number 
of levels from a basic core set (akin to DC, if not semantically identical) 
through to a "2,500 attribute*" person authority record (plus the other bib 
entities), and produce pre-parsers which will massage these to what the ILS (or 
other repository/system) is comfortable with. Since the "receiving system" is 
fixed for any one installation it does not need the complexity we build into 
our fed search platforms, and converters would be largely re-usable.

So, what about a Russian doll bibliographic schema? (Who gets to decide on what 
goes in which level is for years of committee work - unemployment solved!)


* number obtained from a line count from 
http://www.loc.gov/marc/authority/ecadlist.html - so rather approximate.

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Jonathan Rochkind
> Sent: Monday, December 05, 2011 10:57 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Models of MARC in RDF
> 
> On 12/5/2011 1:40 PM, Karen Coyle wrote:
> >
> > This brings up another point that I haven't fully grokked yet: the use
> > of MARC kept library data "consistent" across the many thousands of
> > libraries that had MARC-based systems.
> 
> Well, only somewhat consistent, but, yeah.
> 
> > What happens if we move to RDF without a standard? Can we rely on
> > linking to provide interoperability without that rigid consistency of
> > data models?
> 
> Definitely not. I think this is a real issue.  There is no magic to "linking" 
> or RDF that provides
> interoperability for free; it's all about
> the vocabularies/schemata -- whether in MARC or in anything else.
> (Note different national/regional  library communities used different 
> schemata in MARC, which made
> interoperability infeasible there. Some still do, although gradually people 
> have moved to Marc21
> precisely for this reason, even when Marc21 was less powerful than the MARC 
> variant they started with).

Just a comment about the "good old days" when we had to work with USMARC, 
UKMARC, DANMARC, MAB1, AUSMARC, and so on. "interoperability infeasible" was 
not the situation. It was perfectly possible to convert records from one format 
to another - with some loss of data into the less specific format of course. 
Which meant that a "round trip" was not possible. But "major elements" were 
present in all and that meant it was practically useful to do it. We did this 
at the British Library when I was there, and we did it commercially as a 
service for OCLC (remember them?) as a commercial ILS vendor. It did involve 
specific coding, and an internal database system built to accommodate the 
variability. 

> 
> That is to say, if we just used MARC's own implicit vocabularies, but output 
> them as RDF, sure, we'd
> still have consistency, although we
> wouldn't really _gain_ much.On the other hand, if we switch to a new
> better vocabulary -- we've got to actually switch to a new better vocabulary. 
>  If it's just "whatever
> anyone wants to use", we've made it VERY difficult to share data, which is 
> something pretty darn
> important to us.
> 
> Of course, the goal of the RDA process (or one of em) was to create a new 
> schema for us to
> consistently use. That's the library community effort to maintain a common 
> schema that is more
> powerful and flexible than MARC.  If people are using other things instead, 
> apparently that failed, or
> at least has not yet succeeded.


Re: [CODE4LIB] Examples of visual searching or browsing

2011-10-28 Thread Peter Noerr
This looks really colorful, but how does it aid searching, or browsing?

The pie chart is useful for a collections development librarian to see how the 
collection is distributed across broad subject areas.

How does it help me, a user, searching for books on Dentistry (yes they are 
there, all 9443 of them) to know that the biggest collections are in Asian 
history and languages (and books). What functionality does the visualization 
add to the list of topics given below? It's organized by call number (starting 
at 3 o'clock?), so I don't even have alphabetic headings to help. And the 198 
general works, and 375 dictionaries just disappear. 

It looks nice, but exactly what searching purpose does it enhance - either by 
its existence, or over the alternative list display (boring, but complete)?


Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Julia 
> Bauder
> Sent: Friday, October 28, 2011 9:55 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Examples of visual searching or browsing
> 
> This is all fabulous, thank you! MapFast and the HathiTrust visualizations 
> are exactly the kinds of
> things I was looking for, and the tree-mapping idea also sounds like a very 
> good one for visualizing
> collections.
> 
> Thanks!
> 
> On Fri, Oct 28, 2011 at 11:11 AM, Margaret Anderson  wrote:
> 
> > Take a look at a visualization of HathiTrust works by call number
> >
> > http://www.hathitrust.org/visualizations_callnumbers
> >
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> > Of Julia Bauder
> > Sent: Thursday, October 27, 2011 4:27 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: [CODE4LIB] Examples of visual searching or browsing
> >
> > Dear fans of cool Web-ness,
> >
> > I'm looking for examples of projects that use visual(=largely non-text
> > and
> > non-numeric) interfaces to let patrons browse/search collections.
> > Things like the GeoSearch on North Carolina Maps[1], or projects that
> > use Simile's Timeline or Exhibit widgets[2] to provide access to
> > collections (e.g., what's described here:
> > https://letterpress.uchicago.edu/index.php/jdhcs/article/download/59/7
> > 0), or in-the-wild uses of Recollection[3]. I'm less interested in
> > knowing about tools (although I'm never *uninterested* in finding out
> > about cool tools) than about production or close-to-production sites
> > that are making good use of these or similar tools to provide visual,
> > non-linear access to collections. Who's doing slick stuff in this area
> > that deserves a look?
> >
> > Thanks!
> >
> > Julia
> >
> > [1] http://dc.lib.unc.edu/ncmaps/search.php
> > [2] http://www.simile-widgets.org/
> > [3] http://recollection.zepheira.com/
> >
> >
> >
> >
> > *
> >
> > Julia Bauder
> >
> > Data Services Librarian
> >
> > Interim Director of the Data Analysis and Social Inquiry Lab (DASIL)
> >
> > Grinnell College Libraries
> >
> >  Sixth Ave.
> >
> > Grinnell, IA 50112
> >
> >
> >
> > 641-269-4431
> >


Re: [CODE4LIB] JHU integration of PD works

2011-06-16 Thread Peter Noerr
Numerous comments from today's posts.

As to Jonathan on complexity, resources and "we've got it working". Indeed you 
have and it is a cool looking UI and functionality behind it. I did not mean to 
imply that you had not got a working system, or that anyone else could not do 
it (see Dan's comments on his available JavaScript). The important word in what 
I wrote was "scale". Many of the content providers, ILS vendors, and even 
enterprise software producers who have come to use Muse as a third party 
middleware platform have done so after they have started on this type of work. 
(I include federated search, known item enrichment, record harvesting, record 
conversion and the like as the same type of work here.) They got it written, 
running and started expanding. Then they came to see us once the number of 
Sources reached about 2 dozen. At this point they found that the work of 
maintenance (Karen's concern) was actually starting to take whole numbers of 
FTE programmers, and it was becoming costly. The projections to hu!
 ndreds of Sources were scary financially, yet there was/is demand for the 
capability. So they came to us to take advantages of economies of scale where 
we can build and fix...and fix...and fix... just once for all our partners. 
That way it works. It also works on a small scale with well known, well defined 
Sources. (More horror stories can wait for the pub.)

Integration with ILS's (we have our system integrated with two, two more were - 
but have gone their own way, and are developing with a fifth one, so some 
experience): Generally this is technically challenging because the ILS 
catalogue is display software which is built to handle results from a single 
source - the search engine of the catalogue. Thus integrating other results 
into the "data stream" is just not possible, without development. So you have 
to go about it some other way.
 
First Possibility is to layer the extra functionality on top of the 
OPAC. Then this code becomes the OPAC and a lot of work has to be done to   
replicate the OPAC functions correctly down to the underlying ILS. And some of 
the ILS vendors have a hissy fit about this - just saying. 

Second Possibility is to do what Dan has done and make the extra 
functionality a display level action. In other words create a Browser based
client which does what you want in terms of aggregating records. Again, our 
experience has been that this does not make ILS vendors feel all warm   and 
cuddly, but there is not a lot they can do about it - they do not own the users 
browsers. A version of this approach is what Umlaut does (asI 
understand it - which could be very wrong :-( ) where the additional 
functionality is sever based, but is an adjunct to the main OPAC.

Third possibility is to go right to the backend and put the integration 
between the ILS and the search engine for the OPAC. Thus the OPAC talks 
to the federator, and it queries the OPAC database and any other Sources, 
presenting all the results as one stream, as if from the ILS database. 

Surprisingly (maybe - it was to me a long while ago) the easiest way to do this 
- with tech help - is the third. And it seems to be the one which gives the ILS 
vendors the least qualms. (Some caveats there - so see the reply to Karen's 
concerns below.) With most ILSs running as client-server architectures (as far 
as their DB/search engine are concerned), there is a natural break point to 
make use of. But this is not just a bit of universally applicable JavaScript - 
it is unique to each ILS, and only makes sense in a dedicated installation with 
the technical resources to implement and maintain it, or in a situation like 
ours, where we can make use of that one integration to add access to thousands 
of extra sources, consequently fitting all (so far) user requirements.

Karen's point about approaching Vendors and their concern about stability of 
Source(s). (Well, a lot of comment from others as well, but she started it.) 
This is a concern and, as I said above, one of the reasons why vendors work 
with us. We can guarantee a stable API for the vendor ILS whatever the vagaries 
of the actual Source. And I think that would be vital. The ILS vendors are not 
interested in crafting lots of API or parsing code and fixing it continuously. 
So OL would have to guarantee a stable API, which met minimum functionality 
requirements, and keep it running for at least a dozen years. We are still 
running the first API we produced some 10 years ago as there are deployed 
systems out there which use it and they are not going to be replaced any time 
soon. The users lose out on new functionality (a lot of it!), but cannot or 
will not pay for the upgraded ILS. A subsidiary advantage of this "stable third 
party supplier" scenario is that Karen's last query ("graceful!
  degradation?") is our problem, not the ILS's. We have to handle that and 
notify the ILS, and it 

Re: [CODE4LIB] JHU integration of PD works

2011-06-15 Thread Peter Noerr
I would just like to confirm from years of practical experience that Jonathan 
is right - this is hard technically. Not in principle, but the devil is in the 
details and they are all different, and often change. The very neat addition to 
the JHU catalog that Eric reported on that started this thread 
(https://catalyst.library.jhu.edu/catalog/bib_816990) is an example of what we 
call secondary searching and/or enrichment. 

And it is available - in our commercial software (not a plug - we don't sell 
it, just noting that it is not the sort of thing to try yourself on any scale - 
it takes a lot of resources). Our software is incorporated in the offerings of 
a number of the ILS and content vendors. Admittedly almost exclusively for 
federated searching, but the problems are the same. And Jonathan enumerates 
them pretty well below. So, to answer Karen's question, it can be done if the 
ILS vendors make the functionality available, and the libraries configure it.

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Jonathan Rochkind
> Sent: Wednesday, June 15, 2011 10:34 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] JHU integration of PD works
> 
> On 6/15/2011 10:55 AM, Karen Coyle wrote:
> >
> > I've been struggling with this around the Open Library digital texts:
> > how can we make them available to libraries through their catalogs?
> > When I look at the install documentation for Umlaut [1](I was actually
> > hoping to find a "technical requirements" list), it's obvious that it
> > takes developer chops.
> 
> This isn't neccesarily un-fixable. I have plans to make it easier --
> it's totally possible to make it easier (largely because Rails, on which
> Umlaut is based, has gotten so much better at being easier to
> install/deploy things and have em Just Work), I just need to find time
> (that I'm having trouble finding) to make the changes.
> 
> Eric, as well as Karen,  also asked why no vendors seem interested in
> supplying a product like this -- may be a bit of a chicken and an egg,
> there may not be a market for it -- I have trouble explaining to people
> why Umlaut is actually really cool in the first place, even other
> libraries. Although these conversations help me learn new ways to
> talk/think about it.
> 
> So, I can definitely make Umlaut easier to install and run -- but there
> are still going to be some technical craziness, involved with dealing
> with your local metadata in all it's local idiosyncracies, and dealing
> with matching it to 'remote' data in a way that meets local use cases.
> Like I said before, this is inherently imperfect, but that means that
> there are a bunch of choices to make about what imperfect trade-offs you
> want to make, and these inevitably have to do with the nature of your
> local (mostly cataloging) metadata, and the use cases you are supporting.
> 
> Really, I'm not sure I have faith in our existing vendors to be able to
> do a good job with it -- this is a really complicated thing that Umlaut
> is trying to do, in the end. (from my experience; it didn't sound that
> complicated at first, but it ends up so. Trouble-shooting problems ends
> up being incredibly complex, because there are so many different systems
> involved, and a bug or bad metadata on any one can mess things up).
> 
> So I guess what I'm saying is, if you're talking about Umlaut's approach
> -- it is a technically hard problem in our existing environment.
> ("existing environment" means our really bad local cataloging metadata,
> our multiple silo's of local metadata, and our pretty awful 'link
> resolver' products with poor API's, etc -- also the third party content
> host's poor metadata, lack of API's, etc.  None of these things are
> changing anytime soon). So if you're talking about this approach in
> particular, when Erik asks "is it technical or is political" -- my
> experience with Umlaut definitely definitely says 'technical', not
> 'political'. I've gotten no opposition to what Umlaut's trying to do,
> once people understand it, only dissatisfaction with how well it does it
> (a technical issue).
> 
> Jonathan


Re: [CODE4LIB] Group-sourced Google custom search site?

2011-05-11 Thread Peter Noerr
Just curious: - what do you mean by " Some way to avoid the site-scrapers who 
populate the troubleshooting
> pages." (last sentence below)?

I presume you are wishing to avoid the "trouble shooting" sites which consist 
of nothing more than pages copied from other sites, and look only at the prime 
source pages for information?

Peter

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cindy 
> Harper
> Sent: Monday, May 02, 2011 2:15 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Group-sourced Google custom search site?
> 
> That reminds me - I was looking last week into the possibility of making a
> Google custom search site with either a whitelist of trusted technology
> sites, or a blacklist of sites to exclude.  I haven't looked into whether
> the management of that could be group-sourced, but maybe someone else here
> has thought about this.  I haven't looked into the terms of service of
> custom search sites, either.  But of course slashdot was high on the
> whitelist.  I was thinking about sites for several purposes - general
> technology news and opinion, or specific troubleshooting / programming
> sites.  Some way to avoid the site-scrapers who populate the troubleshooting
> pages.
> 
> 
> Cindy Harper, Colgate U.


Re: [CODE4LIB] json challenge, or hackers wanted

2010-09-24 Thread Peter Noerr
For a start, not a replacement for a hacker producing something for you, you 
might want to investigate what some of the existing jQuery plug-ins (not the 
cool/kewl UI widgets) can do already. Try 
http://plugins.jquery.com/projects/plugins?type=54 As part of a similar 
investigation I have been impressed with the functionality of "Datatables" from 
that list. I have not used it in anger yet, but it does a lot of what you are 
looking for.

Peter Noerr

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric 
> Lease Morgan
> Sent: Friday, September 24, 2010 6:09 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] json challenge, or hackers wanted
> 
> This is a JSON challenge, or a hackers wanted call. Specifically, I am 
> looking for leads on how to
> slurp up a JSON file and create a cool (or "kewl") Web interface to the data. 
> Let me explain.
> 
> I have created a small matrix consisting of about 125 rows by 125 columns. 
> Each row represents a book
> in the series called the Great Books of the Western World. Columns include 
> identifiers, word counts,
> grade levels, readability scores, and integers I call "Great Idea 
> Coefficients". For more information
> about this data, see the blog posting. [1]
> 
> Here's the challenge:
> 
>   1. convert the matrix into a JSON object
>   2. save the object as a file
>   3. write a Javascript library allowing the
>  patron to manipulate, aggregate, summarize,
>  chart, and display variations of the JSON
> 
> For example, slurp up the JSON and simply display a pretty list of all the 
> elements. Allow the user to
> sort the list by author, title, length, or any one of the Coefficients. Allow 
> the user to select only
> the items authored by Shakespeare and display the same sort of... sorts. 
> Allow the user to select all
> the items with a love Coefficient greater than n, sort them by n, and 
> illustrate the result using a
> bar chart. Create a scatter plot denoting any relationships between length of 
> book and its "greatness".
> Allow the user to drag and drop selected items into a container (a div 
> element) and summarize them
> according to grade level or readability. Etc.
> 
> The goal is to allow the patron to analyze the texts -- do "distant reading" 
> -- and to create many
> different visualizations.
> 
> Ideally this Javascript library would exploit JQuery for all of its cool user 
> interface
> characteristics.
> 
> In the end, the techniques used to quantitatively describe the Great Books 
> could be applied to other
> texts (other books, blog postings, open access journal articles, etc.), and 
> this Javascript library
> could be used as a part of a "next, next generation library catalog" or 
> "discovery system".
> 
> Fun?
> 
> [1] blog - http://infomotions.com/blog/2010/09/great-books-data-dictionary/
> 
> --
> Eric Morgan
> University of Notre Dame


Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Agreed it is a problem. What MSSEs do (when operating this way) is make this 
issue a response time dependent one. Users themselves make it a Source 
dependent one (they only look at results from the sites they decide to search). 
Ranking algorithms make it an algorithm dependent one (their algorithm will 
determine what is top of the list).

In all cases the results are vying for the few slots that the user will 
actually look at - "above the fold", first 3", "first page", etc. The problem 
is that all results cannot be first, and we do not have any way to insist the 
user look at all of them and make an informed selection. Anyway this can go all 
the way back to the collection policies of the library and the aggregators and 
even the cussedness of authors in not writing articles on exactly the right 
topic. (bad authors!) 

The MSEEs try to be even handed about it, but it doesn't always work. Possibly 
saving technologies here are text analysis and faceting. These can help take 
"horizontal slices" out of the vertically ordered list of results. That means 
the users can select another list which will be ordered a bit differently, and 
with text analysis and facets applied again, give them ways to slice and dice 
those results. But, in the end it requires enough interest from the user to do 
some refinement, and that battles with "good enough".

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Walker, David
> Sent: Wednesday, May 19, 2010 1:18 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> > And if the majority of users are only looking at results
> > from one resource... why do a broadcast multi-server
> > search in the first place?
> 
> More than just a theoretical concern.  Consider this from an article by
> Nina McHale:
> 
> "[R]eference and instruction staff at Auraria were asked to draw up a
> list of ten or so resources that would be included in a general-focus
> “Quick Search” . . . [h]owever, in practice, the result was
> disappointing. The results returned from the fastest resource were the
> results on top of the pile, and of the twelve resources chosen,
> PsycINFO routinely returned results first. Reference and instruction
> staff rightly felt that this skewed the results for a general query."
> [1]
> 
> One library' perspective, and I'm pretty sure they were not using Muse.
> But conceptually the concern would be the same.
> 
> --Dave
> 
> [1] http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and-
> instruction.html
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind [rochk...@jhu.edu]
> Sent: Wednesday, May 19, 2010 12:45 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> Wait, but in the case you suspect is common, where you return results
> as
> soon as the first resource is returned, and subsequent results are
> added
> to the _end_ of the list
> 
> I'm thinking that in most of these cases, the subsequent results will
> be
> several pages "in", and the user will never even get there. And if the
> majority of users are only looking at results from one resource... why
> do a broadcast multi-server search in the first place?
> 
> Peter Noerr wrote:
> > However things are a bit different now...  At the risk of opening the
> debate once more and lots of lengthy discussion let me say that our
> experience (as one of the handful of commercial providers of "multi-
> server search engines" (MSSEs? - it'll never stick, but I like it)) is:
> >
> > 1) Times are not slow for most installations as they are set by
> default to provide incremental results in the fashion Jakub suggests
> ("First In, First Displayed"). So users see results driven by the time
> of the fastest Source, not the slowest. This
> means that, on average, getting the results from a MSSE can be faster
> than doing the same search on all of the native sites (just talking
> response times here, not the fact it is one search versus N). Do the
> maths - it's quite fun. 
> >
> > 2) The average "delay" for just processing the results through modern
> MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
> network hops and the additional respons

Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Aha, but we get interleaved results from the different Sources. So the results 
are not "all A", "all B", "all... Even if the results come as complete "sets of 
10", we internally collect them asynchronously as they are processed. The 
number of buffers and processing stages is quite large, so the parallel 
processing nature of multi-tasking means that the results get interleaved. It 
is still possible that one set of results comes in so far in advance of 
everything else that it is completely processed before anything else arrives, 
then the display is "all A", "others".

However the major benefit is that the results from all the Sources are there at 
once, so even if the user uses the system to "skip" from Source to Source, it 
is quicker than running the search on all the Sources individually. And, yes, 
you can individually save "a few here", "one or two there" to make your 
combined chosen few. 

But, first page only viewing does mean that the fastest Sources get the best 
spots. Is this an incentive to speed up the search systems? (Actually it has 
happened that a couple of the Sources who we showed comparative response time 
to, did use the figures to get funds for hardware replacement.)

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, May 19, 2010 12:45 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> Wait, but in the case you suspect is common, where you return results
> as
> soon as the first resource is returned, and subsequent results are
> added
> to the _end_ of the list
> 
> I'm thinking that in most of these cases, the subsequent results will
> be
> several pages "in", and the user will never even get there. And if the
> majority of users are only looking at results from one resource... why
> do a broadcast multi-server search in the first place?
> 
> Peter Noerr wrote:
> > However things are a bit different now...  At the risk of opening the
> debate once more and lots of lengthy discussion let me say that our
> experience (as one of the handful of commercial providers of "multi-
> server search engines" (MSSEs? - it'll never stick, but I like it)) is:
> >
> > 1) Times are not slow for most installations as they are set by
> default to provide incremental results in the fashion Jakub suggests
> ("First In, First Displayed"). So users see results driven by the time
> of the fastest Source, not the slowest. This
> means that, on average, getting the results from a MSSE can be faster
> than doing the same search on all of the native sites (just talking
> response times here, not the fact it is one search versus N). Do the
> maths - it's quite fun. 
> >
> > 2) The average "delay" for just processing the results through modern
> MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
> network hops and the additional response time to first display is about
> 3/4 of a second. This is a time shift all the way down the set of
> results - most of which the user isn't aware of as they are beyond the
> first 10 on screen, and the system allows interaction with those 10
> while the rest are getting their act together. So, under 1 second is
> added to response times which average about 5 seconds. Of course,
> waiting for all the results adds this time to the slowest results.
> >
> > 3) Most users seem happy to get things back faster and not worry too
> much about relevance ranking. To combat the response time issue for
> users who require ranked results, the incremental return can be set to
> show interfiled results as the later records come in and rank within
> the ones displayed to the user. This can be disconcerting, but making
> sure the UI doesn't lose track of the user's focus is helpful. Another
> option is to show that "new results" are available, and let the user
> manually click to get them incorporated - less intrusive, but an extra
> click!
> >
> > General experience with the incremental displays shows that users are
> happiest with them when there is an obvious and clear reason for the
> new additions. The most accepted case is where the ranking criterion is
> price, and the user is always happy to see a cheaper item arrive. It
> really doesn't work well for titles sorted alphabetically - unless the
> user is looking for a specific title which should occur at the
> beginning of the list. And these examples illustrate the general point
> - that if the user is focused on specific items

Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Since we generally return results asynchronously to client systems from our 
MSSE (fed/meta/broadcast/aggregated/parallel/Multi-Server/Search Engine) I 
would just point out that we use other protocols than SRU when doing so. When 
we do use SRU on the client side, then we send back the results in a complete 
set. Otherwise we send them in tranches on a timescale controlled by the client 
system, usually about every 2 seconds.

Obviously an SRU-async protocol is possible, but would it be used? As a MSSE we 
would use it to get results from Sources, so they could be processed earlier 
(smaller response time) and more smoothly. But that would require Source 
servers implemented it, and what would their incentive be to implement it? 

For direct use with end users it would mean a browser client capable of 
retrieving and managing the partial data is needed. Middleware systems (between 
the MSSE and the user) would need to support it, and pass the benefit to the 
user. Any system doing heavy analysis of the results would probably not want 
(and may not be able) to start than analysis until all the results are 
obtained, because of the added messiness of handling partial results sets, from 
multiple Sources (it is messy - believe me). 

I would be very happy to see such a protocol (and have it implemented), and if 
Jakub implemented browser code to handle that end, then the users could benefit.

Peter

Peter Noerr
CTO. MuseGlobal
www.museglobal.com

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jakub Skoczen
> Sent: Tuesday, May 18, 2010 12:51 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> 
> On Tue, May 18, 2010 at 9:17 PM, Ray Denenberg, Library of Congress
>  wrote:
> > First, no. There are extensibility features in SRU but nothing that
> would
> > help here.
> >
> > Actually, Jonathan, what I though you were suggesting was the
> creation of a
> > (I hesitate to say it) metasearch engine. I use that term because it
> is what
> > NISO called it, when they started their metasearch initiative five or
> so
> > years ago, to create a standard for a metasearch engine, but they got
> > distracted and the effort really came to nothing.
> 
> I'm not sure if Jonathan was suggesting that but that's exactly what I
> had in mind - using SRU 2.0 as a front-end protocol for a meta-search
> engine. And yes while creating a third-party, SRU-inspired protocol
> for that purpose could work, I see very little value in such exercise.
> I suspect that, as any standard, SRU has certain limitations and, as
> an implementer, you have to work around them but you do end up with an
> obvious gain: standards compliance. SRU-inspired protocol is not quite
> the same thing, and it's probably easier to go all the way and create
> a custom, proprietary protocol.
> 
> > The premise of the metasearch engine is that there exists a single-
> thread
> > protocol, for example, SRU, and the need is to manage many threads,
> which is
> > what the metasearch engine would have done if it had ever been
> defined. This
> > is probably not an area for OASIS work, but if someone wanted to
> revive the
> > effort in NISO (and put it on the right track) it could be useful.
> >
> > --Ray
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
> Of
> > Jonathan Rochkind
> > Sent: Tuesday, May 18, 2010 2:56 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> >
> > Jakub Skoczen wrote:
> >>
> >>> I wonder if someone, like Kuba, could design an 'extended async
> SRU'
> >>> on top of SRU, that is very SRU like, but builds on top of it to
> add
> >>> just enough operations for Kuba's use case area.  I think that's
> the
> >>> right way to approach it.
> >>>
> >>
> >> Is there a particular "extensibility" feature in the protocol that
> >> allows for this?
> >>
> > I don't know, but that's not what I was suggesting. I was suggesting
> you
> > read the SRU spec, and then design your own "SRU-async" spec, which
> is
> > defined as "exactly like SRU 2.0, except it also has the following
> > operations, and is identified in an Explain document like X."
> >
> > Jonathan
> >
> 
> 
> 
> --
> 
> Cheers,
> Jakub


[CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
However things are a bit different now...  At the risk of opening the debate 
once more and lots of lengthy discussion let me say that our experience (as one 
of the handful of commercial providers of "multi-server search engines" (MSSEs? 
- it'll never stick, but I like it)) is:

1) Times are not slow for most installations as they are set by default to 
provide incremental results in the fashion Jakub suggests ("First In, First 
Displayed"). So users see results driven by the time of the fastest Source, not 
the slowest. This means that, on average, getting the 
results from a MSSE can be faster than doing the same search on all of the 
native sites (just talking response times here, not the fact it is one search 
versus N). Do the maths - it's quite fun. 

2) The average "delay" for just processing the results through modern MSSEs is 
about 0.5 sec. Add to this say another 0.2 for two extra network hops and the 
additional response time to first display is about 3/4 of a second. This is a 
time shift all the way down the set of results - most of which the user isn't 
aware of as they are beyond the first 10 on screen, and the system allows 
interaction with those 10 while the rest are getting their act together. So, 
under 1 second is added to response times which average about 5 seconds. Of 
course, waiting for all the results adds this time to the slowest results.

3) Most users seem happy to get things back faster and not worry too much about 
relevance ranking. To combat the response time issue for users who require 
ranked results, the incremental return can be set to show interfiled results as 
the later records come in and rank within the ones displayed to the user. This 
can be disconcerting, but making sure the UI doesn't lose track of the user's 
focus is helpful. Another option is to show that "new results" are available, 
and let the user manually click to get them incorporated - less intrusive, but 
an extra click!

General experience with the incremental displays shows that users are happiest 
with them when there is an obvious and clear reason for the new additions. The 
most accepted case is where the ranking criterion is price, and the user is 
always happy to see a cheaper item arrive. It really doesn't work well for 
titles sorted alphabetically - unless the user is looking for a specific title 
which should occur at the beginning of the list. And these examples illustrate 
the general point - that if the user is focused on specific items at the top of 
the list, then they are generally happy with an updating list, if they are more 
in "browse" mode, then the distraction of the updating list is just that - a 
distraction, if it is on screen. 

Overall our experience from our partner's users is that they would rather see 
things quickly than wait for relevance ranking. I suspect partly (can of worms 
coming) because the existing ranking schemes don't make a lot of difference 
(ducks quickly).

Peter

Peter Noerr
CTO, Museglobal
www.museglobal.com

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Walker, David
> Sent: Tuesday, May 18, 2010 12:44 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> 
> > in order to provide decent user experience you need to be
> > able to present some results "sooner" than others.
> 
> I would actually question whether this is really necessary.
> 
> A few years back, I did a big literature review on metasearch, as well
> as a looked at a good number of usability studies that libraries did
> with metasearch systems.
> 
> One thing that stood to me out was that the literature (written by
> librarians and technologists) was very concerned about the slow search
> times of metasearch, often seeing it as a deal-breaker.
> 
> And yet, in the usability studies, actual students and faculty were far
> less concerned about the search times -- within reason, of course.
> 
> I thought the UC Santa Cruz study [1] summarized the point well: "Users
> are willing to wait as long as they think that they will get useful
> results. Their perceptions of time depend on this belief."
> 
> Trying to return the results of a metasearch quickly just for the sake
> of returning them quickly I think introduces other problems (in terms
> of relevance ranking and presentation) that do far more to negatively
> impact the user experience.  Just my opinion, of course.
> 
> --Dave
> 
> [1]
> http://www.cdlib.org/services/d2d/metasearch/docs/core_ucsc_oct2004usab
> ility.pdf
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 

Re: [CODE4LIB] What do you want out of a frbrized data web service?

2010-04-21 Thread Peter Noerr
For our fed search service we very much echo Jonathan's real-time 
requirements/use case (we don't build indexes, so bulk download is not of 
interest):

access - real-time query (purpose - to enhance data about items found by other 
means)
query - by standard IDs (generally this is "known item" augmentation, so 
"discovery" queries by keywords, etc are not so much required)
data format - almost anything "standard"  (we can translate it into the 
internal data model structure)
big value add - relationships, mainly the "upward" ones, towards work
data quantity - all details of directly related items, plus 2nd level links, 
possibly all details all the way up to (and including) the work (this is a 
trade-off of processing time on the service side to gather this information, 
and on our side to de-construct vs. the time to set up and manage multiple 
service calls to get the data about individual items in the link chain. In our 
experience it is almost always quicker to get it "all-at-once" than to send 
repeated messages, even if the total amount of data is less in the latter. But, 
mileage may vary here.) 


Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, April 21, 2010 7:59 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] What do you want out of a frbrized data web
> service?
> 
> So, okay, the "value added" stuff you have will indeed be relationships
> between entities, which is not too unexpected.
> 
> So, yes, I would want a real-time query service (for enhancement of
> individual items on display in my system) _as well as_ a bulk download
> (for enhancing my data on indexing).
> 
> For real time query, I'd have a specific entity at hand roughly
> corresponding to a 'manifestation'.  I'd want to look it up in your
> system by any identifiers I have (oclcnum, lccn, isbn, issn; any other
> music-related identifiers that are useful?) to find a match. Then I'd
> want to find out it's workset ID (or possibly expression ID?) in your
> system, and be able to find all the OTHER manifestations/expressions in
> those sets, from your system, with citation details about those items.
> (Author, title, publisher, year, etc; also oclcnum/lccn/isbn/issn/etc
> if
> available. Just giving me Marc with everything might be sufficient).
> If you have work identifiers from other systems that correspond to your
> workID (OCLC workID? etc), I'd want to know those.
> 
> For bulk download, yeah, I'd just want everything you could give me,
> really.
> 
> Some of the details can't really be spec'd in advance, it requires an
> interative process of people trying to use it and seeing what they need.
> I know this makes things hard from a grant-funded project management
> perspective.
> 
> Jonathan
> 
> Riley, Jenn wrote:
> > On 4/20/10 7:18 PM, "Jonathan Rochkind"  wrote:
> >
> >
> >> But first, to really answer the question, we need some more
> information
> >> from you. What data do you actually have of value? Just saying "we
> have
> >> FRBRized data" doesn't really tell me, "FRBRized data" can be almost
> >> anything, really.   Can you tell us more about what value you think
> >> you've added to your data as a result of your "FRBRization"?  What
> do
> >> you have that wasn't there before?  Better relationships between
> >> manifestations?  Something else?
> >>
> >
> > Heh, I was intentionally vague in an attempt to avoid skewing the
> discussion
> > in certain directions, but I was obviously *too* vague - my apologies.
> Here
> > are the sorts of things we'd imagined and are looking to prioritize:
> >
> > - Give me a list of all manifestations that match some arbitrary
> query terms
> > - Given this manifestation identifier, show me all expressions on it
> and
> > what works they realize
> > - Give me a list of all works that match some arbitrary query terms
> > - Given this work identifier, show all expressions and manifestations
> of it
> > - Show me all of the people who match some arbitrary query terms
> (women
> > composers in Vienna in the 1860s, for example)
> > - Which works have expressions with this specific relationship to
> this
> > particular known person?
> >
> > Basically we're exploring when we should support queries as words vs.
> > previously-known identifiers, when a response will all be a set of
> records
> > for the same entity vs. several different ones with the relationships
> > between them recorded, to what degree answering a query will involve
> > traversing lots of relationships - stuff like that. Having some real
> use
> > cases will help us decide what kind of a service to offer and what
> > technology we'll use to implement that service.
> >
> > We do hope to also be able to publish Linked Data in some form -
> that's
> > probably going to come a little later, but it's definitely on "the
> list".
> >
> > To answer one of your other questions, the V/FRBR project is focusing
> on
> > musica

Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
I will just add (again) to the request for all links. As Jonathan says the 
client can then decide what to show, how to group them, and so on. 

I had rather sloppily elided things like format of full text into my 
"structural" information about the link. 

And second the request that some simple coding (controlled vocabulary anyone?) 
is used for these values so that we clients can determine what we are seeing.

Thanks  -  Peter


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> stuart yeates
> Sent: Tuesday, March 30, 2010 18:20
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Works API
> 
> Jonathan Rochkind wrote:
> > Karen Coyle wrote:
> >> The OL only has full text links, but the link goes to a page at the
> >> Internet Archive that lists all of the available formats. I would
> >> prefer that the link go directly to a display of the book, and offer
> >> other formats from there (having to click twice really turns people
> >> off, especially when they are browsing). So unfortunately, other than
> >> "full text" there won't be more to say.
> >
> > In an API, it would be _optimal_ if you'd reveal all these links, tagged
> > with a controlled vocabulary of some kind letting us know what they are,
> > so the client can decide for itself what to do with them (which may not
> > even be immediately showing them to any user at all, but may be
> > analyzing them for some other purpose).
> 
> Even better, for those of us who have multiple formats of full text (TEI
> XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs
> to the full text, differentiated using the mime-type.
> 
> cheers
> stuart
> --
> Stuart Yeates
> http://www.nzetc.org/   New Zealand Electronic Text Centre
> http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
For our purposes (federated search) it would be most useful to have as many of 
the available links (OL or other) as possible, and as much information about 
the link as possible. Obvious "structural" stuff like the type of identifier, 
but also the nature of the linked object (as you suggest "full text", "scan", 
etc.) This enables the links to be "categorized" in the user display so they 
can eliminate the ones not of interest, or focus on those that are.

Anything which differentiates the links from the perspective of the user is 
generally useful. In this regard some information about the editions at the 
ends of the links (even just a number and/or date) would be useful, and stop 
systems coming back to OL multiple times for all the linked records only to 
extract and display one or two bits of information. This has got to be the 
worst case for user response time, and almost certainly for load on the OL 
system. So if a certain amount of this information can be statically 
pre-coordinated with the links, or gathered by OL at request time, it has got 
to be more efficient.

For us the format of the records is of little importance as we convert them 
anyway.

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Karen Coyle
> Sent: Tuesday, March 30, 2010 10:23
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Works API
> 
> Open Library now has Works defined, and is looking to develop an API
> for their retrieval. It makes obvious sense that when a Work is
> retrieved via the API, that the data output would include links to the
> Editions that link to that Work. Here are a few possible options:
> 
> 1) Retrieve Work information (author, title, subjects, possibly
> reviews, descriptions, first lines) alone
> 2) Retrieve Work information + OL identifiers for all related Editions
> 3) Retrieve Work information + OL identifiers + any other identifiers
> related to the Edition (ISBN, OCLC#, LCCN)
> 4) Retrieve Work information and links to Editions with full text / scans
> 
> Well, you can see where I'm going with this. What would be useful?
> 
> kc
> 
> --
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet


Re: [CODE4LIB] code4lib / elag2010

2009-06-04 Thread Peter Noerr
As a long time founder member of ELAG I think this is an excellent idea. In its 
early days (for maybe the first 10 years) ELAG was a very technical meeting 
(yes, you *can* be geeky about mainframe software - all we had back then), but 
it has moved from that over time for all the best of reasons. I can't speak 
about the meeting in Bratislava, as I wasn't there, but it has over the years 
become more edu-torial and less cutting edge. I think it would benefit from the 
addition of a whole day of decidedly technical content in addition to the 
review and descriptive papers and its own very strong workshop format.

Peter Noerr

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Frumkin, Jeremy
> Sent: Thursday, June 04, 2009 08:51
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] code4lib / elag2010
> 
> Hi Nicolas -
> 
> I think what you are planning to do is great, and I personally can't see
> any issues with what you've described. There is a code4lib NW event today
> in Oregon, and there have been a number of "regional" code4lib events put
> together by members of the community.
> 
> To me, holding events like this are exactly what code4lib is all about.
> 
> 
> -- jaf
> 
> ==
> Jeremy Frumkin
> Assistant Dean / Chief Technology Strategist
> University of Arizona Libraries
> 
> frumk...@u.library.arizona.edu
> +1 520.307.4548
> ==
> 
> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Nicolas Morin
> Sent: Thursday, June 04, 2009 8:34 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] code4lib / elag2010
> 
> Hello,
> 
> A number of the code4lib community members are based in Europe : some
> of us where at the code4lib conference in the US in February; some of
> us where at the ELAG conference in Bratislava in Europe last April.
> There has been some talks about a code4lib conference in Europe in the
> past. At ELAG in April this idea came up again. But as Eric Lease
> Morgan noted in this forum a while ago on this subject, it would not
> be a good idea to duplicate efforts. ELAG itself is not as hands-on
> and geeky as code4lib : Ross Singer, who was at ELAG this year, said
> he found it more like Access. And while we don't want to duplicate
> efforts, we feel there's a place for a more technically oriented
> event.
> So, after some discussion with the people from ELAG, here's what we
> propose for 2010 : not a Code4lib Europe conference, but an addition
> to the existing ELAG conference : a one-day, pre-conference seminar
> for the code4lib community. Something hands-on, technical. Something
> about Lucene/solr in it's many incarnations was suggested, but we're
> very much open to other suggestions at this stage.
> 
> I want to make sure that the code4lib community finds it appropriate
> that we use the code4lib "space" to setup something that's going to be
> tied to another conference (ELAG). I don't think there's an issue
> here, but I want to make sure no one feels the code4lib "brand" is
> being inappropriately used.
> 
> I also want to make sure that those of you who are interested can
> participate in the setting up of this pre-conference : I opened a
> (more or less blank, at this stage) wiki page for this at
> http://wiki.code4lib.org/index.php/Code4lib/elag2010
> 
> The code4lib community members who have expressed interest in setting
> this up so far are : Jakob Voss, Etienne Posthumus, Peter VanBoheemen,
> Till Kinstler and myself. If you're interested in this effort, feel
> free to go edit the wiki page.
> 
> You can get more information about ELAG 2009 at
> http://indico.ulib.sk/elag2009
> ELAG 2010 will be hosted by the Finnish national library in Helsinki,
> in June 2010
> 
> Cheers,
> Nicolas
> 
> --
> Nicolas Morin
> Mobile: +33(0)633 19 11 36
> http://www.biblibre.com


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my 
thoughts recently and the problems we have here are writ large over all the SW 
people are trying to achieve. Perhaps we can help...

Peter 

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Friday, May 01, 2009 13:40
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Ideally, though, if we have some buy in and extend this outside our
> communities, future identifiers *should* have fewer variations, since
> people can find the appropriate URI for the format and use that.
> 
> I readily admit that this is wishful thinking, but so be it.  I do
> think that modeling it as SKOS/RDF at least would make it attractive
> to the Linked Data/Semweb crowd who are likely the sorts of people
> that would be interested in seeing URIs, anyway.
> 
> I mean, the worst that can happen is that nobody cares, right?
> 
> -Ross.
> 
> On Fri, May 1, 2009 at 3:41 PM, Peter Noerr  wrote:
> > I am pleased to disagree to various levels of 'strongly" (if we can agree
> on a definition for it :-).
> >
> > Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he
> supplied
> >
> > -snip
> > We could have something like:
> > <http://purl.org/DataFormat/marcxml>
> >  .  "MARC21 XML" .
> >  .  "info:srw/schema/1/marcxml-v1.1" .
> >  .  "info:ofi/fmt:xml:xsd:MARC21" .
> >  .  "http://www.loc.gov/MARC21/slim"; .
> >  .  http://purl.org/DataFormat/marc .
> >  .  "..." .
> >
> > Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
> really the point.  The point is that all of these various identifiers would
> be valid, but we'd have a real way of knowing what they actually mean.
>  Maybe this is what you mean by a crosswalk.
> > --end
> >
> > Is exactly what I meant by a "crosswalk". Basically a translating
> dictionary which allows any entity (system or person) to relate the various
> identifiers.
> >
> > I would love to see a single unified set of identifiers, my life as a
> wrangled of record semantics would be s much easier. But I don't see it
> happening.
> >
> > That does not mean we should not try. Even a unification in our space
> (and "if not in the library/information space, then where?" as Mike said)
> reduces the larger problem. However I don't believe it is a scalable
> solution (which may not matter if all of a group of users agree, they why
> not leave them to it) as, at any time one group/organisation/person/system
> could introduce a new scheme, and a world view which relies on unified
> semantics would no longer be viable.
> >
> > Which means until global unification on an object (better a (large) set
> of objects) is achieved it will be necessary to have the translating
> dictionary and systems which know how to use it. Unification reduces Ray's
> list of 15 alternative uris to 14 or 13 or whatever. As long as that number
> is >1 translation will be necessary. (I will leave aside discussions of
> massive record bloat, continual system re-writes, the politics of whose
> view prevails, the unhelpfulness of compromises for joint solutions, and so
> on.)
> >
> > Peter
> >
> >> -Original Message-
> >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> >> Mike Taylor
> >> Sent: Friday, May 01, 2009 02:36
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
> Rule
> >> Them All
> >>
> >> Jonathan Rochkind writes:
> >>  > Crosswalk is exactly the wrong answer for this. Two very small
> >>  > overlapping communities of most library developers can surely agree
> >>  > on using the same identifiers, and then we make things easier for
> >>  > US.  We don't need to solve the entire universe of problems. Solve
> >>  > the simple problem in front of you in the simplest way that could
> >>  > possibly work and still leave room for future expansion and
> >>  > improvement. From that, we learn how to solve the big problems,
> >>  > when we're ready. Overreach and try to solve the huge problem
> >>  > including every possible use

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I am pleased to disagree to various levels of 'strongly" (if we can agree on a 
definition for it :-).

Ross earlier gave a sample of a "crossw3alk' for my MARC problem. What he 
supplied

-snip
We could have something like:

  .  "MARC21 XML" .
  .  "info:srw/schema/1/marcxml-v1.1" .
  .  "info:ofi/fmt:xml:xsd:MARC21" .
  .  "http://www.loc.gov/MARC21/slim"; .
  .  http://purl.org/DataFormat/marc .
  .  "..." .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really 
the point.  The point is that all of these various identifiers would be valid, 
but we'd have a real way of knowing what they actually mean.  Maybe this is 
what you mean by a crosswalk.
--end

Is exactly what I meant by a "crosswalk". Basically a translating dictionary 
which allows any entity (system or person) to relate the various identifiers.

I would love to see a single unified set of identifiers, my life as a wrangled 
of record semantics would be s much easier. But I don't see it happening. 

That does not mean we should not try. Even a unification in our space (and "if 
not in the library/information space, then where?" as Mike said) reduces the 
larger problem. However I don't believe it is a scalable solution (which may 
not matter if all of a group of users agree, they why not leave them to it) as, 
at any time one group/organisation/person/system could introduce a new scheme, 
and a world view which relies on unified semantics would no longer be viable.

Which means until global unification on an object (better a (large) set of 
objects) is achieved it will be necessary to have the translating dictionary 
and systems which know how to use it. Unification reduces Ray's list of 15 
alternative uris to 14 or 13 or whatever. As long as that number is >1 
translation will be necessary. (I will leave aside discussions of massive 
record bloat, continual system re-writes, the politics of whose view prevails, 
the unhelpfulness of compromises for joint solutions, and so on.)

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Mike Taylor
> Sent: Friday, May 01, 2009 02:36
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Jonathan Rochkind writes:
>  > Crosswalk is exactly the wrong answer for this. Two very small
>  > overlapping communities of most library developers can surely agree
>  > on using the same identifiers, and then we make things easier for
>  > US.  We don't need to solve the entire universe of problems. Solve
>  > the simple problem in front of you in the simplest way that could
>  > possibly work and still leave room for future expansion and
>  > improvement. From that, we learn how to solve the big problems,
>  > when we're ready. Overreach and try to solve the huge problem
>  > including every possible use case, many of which don't apply to you
>  > but SOMEDAY MIGHT... and you end up with the kind of
>  > over-abstracted over-engineered
>  > too-complicated-to-actually-catch-on solutions that... we in the
>  > library community normally end up with.
> 
> I strongly, STRONGLY agree with this.  It's exactly what I was about
> to write myself, in response to Peter's message, until I saw that
> Jonathan had saved me the trouble :-)  Let's solve the problem that's
> in front of us right now: bring SRU into harmony with OpenURL in this
> respect, and the very act of doing so will lend extra legitimacy to
> the agreed-on identifiers, which will then be more strongly positioned
> as The Right Identifiers for other initiatives to use.
> 
>  _/|_  ___
> /o ) \/  Mike Taylor
> http://www.miketaylor.org.uk
> )_v__/\  "You cannot really appreciate Dilbert unless you've read it in
>the original Klingon." -- Klingon Programming Mantra


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
I just wanted to be sure that the larger extent of this problem was raised. Two 
(or 4) groups solving the issue is a great start. 

However what you learn here may not be applicable in the large. And some of us 
do have this large problem today. So we work through it in small steps in an 
extensible fashion - which for me is not attempting to create the overall grand 
unified set of everything.

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 18:53
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> 
> Technically it's 4 communities, but, yes, only two currently have
> "credible" registries in place.
> 
> -Ross.
> 
> On Thu, Apr 30, 2009 at 9:28 PM, Jonathan Rochkind 
> wrote:
> > Crosswalk is exactly the wrong answer for this. Two very small
> overlapping communities of most library developers can surely agree on
> using the same identifiers, and then we make things easier for US.  We
> don't need to solve the entire universe of problems. Solve the simple
> problem in front of you in the simplest way that could possibly work and
> still leave room for future expansion and improvement. From that, we learn
> how to solve the big problems, when we're ready. Overreach and try to solve
> the huge problem including every possible use case, many of which don't
> apply to you but SOMEDAY MIGHT... and you end up with the kind of over-
> abstracted over-engineered too-complicated-to-actually-catch-on solutions
> that... we in the library community normally end up with.
> > ____
> > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter
> Noerr [pno...@museglobal.com]
> > Sent: Thursday, April 30, 2009 6:37 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them All
> >
> > Some further observations. So far this threadling has mentioned only
> trying to unify two different sets of identifiers. However there are a much
> larger number of them out there (and even larger numbers of schemas and
> other "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-
> talking-about") and the problem exists for any of these things
> (identifiers, etc.) where there are more than one of them. So really
> unifying two sets of identifiers, while very useful, is not actually going
> to solve much.
> >
> > Is there any broader methodology we could approach which potentially
> allows multiple unifications or (my favourite) cross-walks. (Complete
> unification requires everybody agrees and sticks to it, and human history
> is sort of not on that track...) And who (people and organizations) would
> undertake this?
> >
> > Ross' point about a lightweight approach is necessary for any sort of
> adoption, but this is a problem (which plagues all we do in federated
> search) which cannot just be solved by another registry.
> Somebody/organisation has to look at the identifiers or whatever and decide
> that two of them are identical or, worse, only partially overlap and hence
> scope has to be defined. In a syntax that all understand of course. Already
> in this thread we have the sub/super case question from Karen (in a post on
> the openurl (or Z39.88  - identifiers!) listserv). And the various
> identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709,
> MARCUK-ISO2709. Now explain in words of one (computer understandable)
> syllable what the differences are.
> >
> > I'm not trying to make problems. There are problems and this is only a
> small subset of them, and they confound us every day. I would love to adopt
> standard definitions for these things, but which Standard? Because anyone
> can produce any identifier they like, we have decided that the unification
> of them has to be kept internal where we at least have control of the
> unifications, even if they change pretty frequently.
> >
> > Peter
> >
> >
> > Dr Peter Noerr
> > CTO, MuseGlobal, Inc.
> >
> > +1 415 896 6873 (office)
> > +1 415 793 6547 (mobile)
> > www.museglobal.com
> >
> >
> >> -Original Message-
> >> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> >> Ross Singer
> >> Sent: Thursday, April 30, 2009 12:00
> >> To: CODE4LIB@LISTSERV.ND.EDU
> >> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
> Them
> >> All
> >>
> >> Hello everybody.  I apologize for the

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
"standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88  - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are. 

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Ross Singer
> Sent: Thursday, April 30, 2009 12:00
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
> All
> 
> Hello everybody.  I apologize for the crossposting, but this is an
> area that could (potentially) affect every one of these groups.  I
> realize that not everybody will be able to respond to all lists,
> but...
> 
> First of all, some back story (Code4Lib subscribers can probably skip
> ahead):
> 
> Jangle [1] requires URIs to explicitly declare the format of the data
> it is transporting (binary marc, marcxml, vcard, DLF
> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
> own URI structure for this (http://jangle.org/vocab/formats#...) but
> this was always been with the intention of moving out of the
> jangle.org into a more "generic" space so it could be used by other
> initiatives.
> 
> This same concept came up in UnAPI [2] (I think this thread:
> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
> March/thread.html#682
> discusses it a bit - there is a reference there that it maybe had come
> up before) although was rejected ultimately in favor of an (optional)
> approach more in line with how OAI-PMH disambiguates metadata formats.
>  That being said, this page used to try to set sort of convention
> around the UnAPI formats:
> http://unapi.stikipad.com/unapi/show/existing+formats
> But it's now just a squatter page.
> 
> Jakob Voss pointed out that SRU has a schema registry and that it
> would make sense to coordinate with this rather than mint new URIs for
> things that have already been defined there:
> http://www.loc.gov/standards/sru/resources/schemas.html
> 
> This, of course, made a lot of sense.  It also made me realize that
> OpenURL *also* has a registry of metadata formats:
> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
> refix=oai_dc&set=Core:Metadata+Formats
> 
> The problem here is that OpenURL and SRW are using different info URIs
> to describe the same things:
> 
> info:srw/schema/1/marcxml-v1.1
> 
> info:ofi/fmt:xml:xsd:MARC21
> 
> or
> 
> info:srw/schema/1/onix-v2.0
> 
> info:ofi/fmt:xml:xsd:onix
> 
> The latter technically isn't the same thing since the OpenURL one
> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
> email now, eventually SRU would have registered
> info:srw/schema/1/onix-v2.1
> 
> There are several other examples, as well (MODS, ISO20775, etc.) and
> it's not a

Re: [CODE4LIB] exact title searches with z39.50

2009-04-29 Thread Peter Noerr
To sidestep the issue of strict/relaxed and face the real world of spotty 
implementation of standards (and it seems to apply however non/arcane they are) 
we provide a configurable "strictness" flag and the ability to have 
non-supported indexes and some functions mapped to supported ones on a Source 
by Source basis. Admins can allow users to have this strict/relaxed switch or 
not. And users can apply it or not. For both the majority case is "not" (i.e. 
relaxed is used).

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 28, 2009 08:43
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] exact title searches with z39.50
> 
> It can be a chicken-egg thing too.  Maybe more users would be doing more
> sophisticated searches if they actually _worked_.
> 
> Plus I know that I could write systems to use federated search to embed
> certain functionality in certain places, if more sophisticated searches
> worked more reliably.
> 
> Walker, David wrote:
> > I'm not sure it's a _big_ mess, though, at least for metasearching.
> >
> > I was just looking at our metasearch logs this morning, so did a quick
> count: 93% of the searches were keyword searches.  Not a lot of exactness
> required there.  It's mostly in the 7% who are doing more specific searches
> (author, title, subject) where the bulk if the problems lie, I suspect.
> >
> > --Dave
> >
> > ==
> > David Walker
> > Library Web Services Manager
> > California State University
> > http://xerxes.calstate.edu
> > 
> > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Ray
> Denenberg, Library of Congress [r...@loc.gov]
> > Sent: Tuesday, April 28, 2009 8:32 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] exact title searches with z39.50
> >
> > Right, Mike. There is a long and rich history of the debate between loose
> > and strict interpretation, in the world at large, and in particular,
> within
> > Z39.50, this debate raged from the late 1980s throughout the 90s.  The
> > faction that said "If you can't give the client what is asks for, at
> least
> > give them something; make them happy" was almost religious in its zeal.
> > Those who said "If you can't give the client what it asks for, be honest
> > about it; give them good diagnostic information, tell them a better way
> to
> > formulate the request, etc. But don't pretend the transaction was a
> success
> > if it wasn't" was shouted down most every time.   I can't predict, but
> I'm
> > just hoping that lessons have been learned from the mess that that
> mentality
> > got us into.
> >
> > --Ray
> >
> > - Original Message -
> > From: "Mike Taylor" 
> > To: 
> > Sent: Tuesday, April 28, 2009 10:43 AM
> > Subject: Re: [CODE4LIB] exact title searches with z39.50
> >
> >
> >
> >> Ray Denenberg, Library of Congress writes:
> >>
> >>>> The irony is that Z39.50 actually make _much_ more effort to
> >>>> specify semantics than most other standards -- and yet still
> >>>> finds itself in the situation where many implementations do not
> >>>> respond correctly to the BIB-1 attribute 6=3
> >>>> (completeness=complete field) which is how Eric should be able to
> >>>> do what he wants here.
> >>>>
> >>>> Not that I have any good answers to this problem ... but I DO
> >>>> know that inventing more and more replacement standards it NOT
> >>>> the answer.  Everything that's come along since Z39.50 has
> >>>> suffered from exactly the same problem but more so.
> >>>>
> >>> I think this remains to be seen for SRU/CQL, in particular for the
> >>> example at hand, how to search for exact title.  There are two
> >>> related issues: one, how arcane the standard is, and two, how
> >>> closely implementations conform to the intended semantics. And
> >>> clearly the first has a bearing on the second.
> >>>
> >>> And even I would say that Z39.50 is a bit on the arcance side when
> >>> it comes to formulating a query for exact title. With SRU/CQL there
> >>> is an "exact" relation ('exact' in 1.1,

Re: [CODE4LIB] Serials Solutions Summon

2009-04-21 Thread Peter Noerr
>From one of the Federated Search vendor's perspective... 

It seems in the broader web world we in the library world have lost 
"metasearch". That has become the province of those systems (mamma, dogpile, 
etc.) which search the big web search engines (G,Y,M, etc.) primarily for 
shoppers and travelers (kayak, mobissimo, etc.) and so on. One of the original 
differences between these engines and the library/information world ones was 
that they presented results by Source - not combined. This is still evident in 
a fashion in the travel sites where you can start multiple search sessions on 
the individual sites.

We use "Federated Search" for what we do in the library/information space. It 
equates directly to Jonathan's Broadcast Search which was the original term I 
used when talking about it about 10 years ago. Broadcast is more descriptive, 
and I prefer it, but it seems an uphill struggle to get it accepted.

Fed Search has the problem of Ray's definition of Federated, to mean "a bunch 
of things brought together". It can be broadcast search (real time searching of 
remote Sources and aggregation of a virtual result set), or searching of a 
local (to the searcher) index which is composed of material federated from 
multiple Sources at some previous time. We tend to use the term "Aggregate 
Index" for this (and for the Summon-type index) Mixed content is almost a 
given, so that is not an issue. And Federated Search systems have to undertake 
in real time the normalization and other tasks that Summon will be (presumably) 
putting into its aggregate index.

A problem in terminology we come across is the use of "local" (notice my 
careful caveat in its use above). It is used to mean local to the searcher (as 
in the aggregate/meta index above), or it is used to mean local to the original 
documents (i.e. at the native Source).

I can't imagine this has done more than confirm that there is no agreed 
terminology - which we sort of all knew. So we just do a lot of explaining - 
with pictures - to people.

Peter Noerr


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com




> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 21, 2009 08:59
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Serials Solutions Summon
> 
> Ray Denenberg, Library of Congress wrote:
> >
> > Leaving aside metasearch and broadcast search (terms invented more
> recently)
> > it  is a shame if "federated" has really lost its distinction
> > from"distributed".  Historically, a federated database is one that
> > integrates multiple (autonomous) databases so it is in effect a
> virtual
> > distributed database, though a single database.I don't think
> that's a
> > hard concept and I don't think it is a trivial distinction.
> >
> 
> For at least 10 years vendors in the library market have been selling
> us
> products called "federated search" which are in fact
> distributed/broadcast search products.
> 
> If you want to reclaim the term "federated" to mean a local index, I
> think you have a losing battle in front of you.
> 
> So I'm sticking with "broadcast search" and "local index".  Sometimes
> you need to use terms invented more recently when the older terms have
> been used ambiguously or contradictorily.  To me, understanding the two
> different techniques and their differences is more important than the
> terminology -- it's just important that the terminology be understood.


Re: [CODE4LIB] Dutch Code4Lib

2009-01-26 Thread Peter Noerr
You can find out about the current (2009) meeting here 
(http://library.wur.nl/elag2008/elag2009.html). The program is set, but ELAG is 
built round workshops and it is probably possible to add a new one, even at 
this late date. Contact the program committee.
 
ELAG was formed about 25 years ago by people working for the national libraries 
of Europe and the larger universities who were all struggling trying to build 
library automation systems and facing a number of real technical problems. Its 
original aim was, and is, to allow exchange of techniques and experiences. Over 
the years it has changed as technology was developed, often by the E:AG member 
organisations, and the world changed. It has a meeting every year in a 
different location within Europe to encourage a wider audience and enable a 
secondary aim of education in each location for people who were not able to 
travel.
 
Peter Noerr
MuseGlobal
(ex British Library - founder member of ELAG)



From: Code for Libraries on behalf of Edward M. Corrado
Sent: Sat 2009-01-24 17:46
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Dutch Code4Lib



I really don't know much about ELAG besides I'm going there. I think
they have the program set, or just about set, for this year so I don't
know if anything could me done formally in 2009 but maybe an informal
thing could be arranged the day before or the day after? At the least
maybe contacts can be made for 2010?

Does anyone know more about ELAG?

Edward


On 1/23/09, Hamparian,Don  wrote:
> Do you see any opportunities to partner with them for an European meeting?
> Or is that more trouble then its worth?
>
>
> -Original Message-
> From: Code for Libraries on behalf of Ross Singer
> Sent: Thu 1/22/2009 4:06 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Dutch Code4Lib
>
> Eric, this is a good point.  I will be at ELAG this year, and I think
> Ed Corrado will, too.
>
> Past presentations look to be very in line with Code4lib and, in fact,
> it was billed to me as "If you think of Access or Code4Lib but in a
> scenic European setting with great beer then you'll have a good idea
> of what we are planning" by Ron Davies, one of the coordinators.
>
> -Ross.
>
> On Thu, Jan 22, 2009 at 2:33 PM, Eric Lease Morgan  wrote:
>> On 1/22/09 1:02 PM, "Ed Summers"  wrote:
>>
>>> Wow, this sounds too good to be true. Perhaps this is premature, but
>>> do you think there might be interest in hosting a code4lib2010 in the
>>> Netherlands? (he asks selfishly).
>>
>> On another note, there is already a library conference that is apparently
>> very similar to the Access tradition and Code4Lib that takes place in
>> Europe, and I think it is called European Library Automation Group (ELAG).
>> See:
>>
>>  http://indico.ulib.sk/MaKaC/conferenceDisplay.py?confId=5
>>
>> While I would love to have a Code4Lib "thang" in Europe, maybe there is
>> something already in place. This year it is in Bratislava (Slovakia). Next
>> year I believe it takes place somewhere in Norway.
>>
>> --
>> Eric Morgan
>>
>


Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-11 Thread Peter Noerr
Hi Steve,

Thanks for a full reply.

We actually do combine date within enterprises, including from their ILS
and subscription Sources (article databases), and internal repositories.
"Of course" we claim we do it well - and I think we do. A library
background will enable you to face almost any shape of data with aplomb,
if not equanimity.

Data from varied sources is varied in structure, type of content and
level of detail, as you say. It *is* possible to combine it, but it
works best when there is some sort of "commonality" across the sources.
Fortunately most people when searching provide that focus, so the
theoretical problem is very rarely a practical one - and this business
is all about practical solutions. We do actually have a fair number of
the enterprise search engine vendors as partners where we act as a
selective harvesting capability for them and convert the syntax and
semantics of the harvested records into a uniformity they can easily
ingest and work their indexing magic on.

Fence sitting has a long and honourable tradition (both in the UK and
the US), and we 'back both horses' ourselves by being in both the
federated search and content integration space. Thus involved in both
the just-in-case harvesting, and the just-in-time fed searching.

Final thought is that almost everybody we have dealt with is a "special
case" - most of them in the nicest possibly way - so, even for systems
like ours, customization is the order of the day. But that's what
computers allow us to do - adapt to users.

Peter  

> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Steve Oberg
> Sent: Friday, July 11, 2008 12:15 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Enterprise Search and library collection
> [SEC=UNCLASSIFIED]
> 
> Peter,
> 
> Use an search engine and create an aggregated database/index of all
the
> > material from the organization, or use a federated search system to
> > search the repositories/catalogs/databases/etc. in real time? Did
you
> > consider both? And why the choice you made?
> 
> 
> I was not involved in the initial planning. I came in sort of halfway
> through and had to make a lot of the initial planning decisions work.
> (Even
> while I disagreed with some of those decisions.)  Again, my
perspective
> relates mostly to use of catalog data.   However, I would add that we
> did in
> fact have a federated search tool when I came but quickly discarded it
> because it couldn't do the more limited functionality we were hoping
> for it
> to accomplish (present best options to users for where to search among
> our
> databases and collections according to subject), let alone aggregate
or
> search across disparate data repositories.
> 
> Personally I find it very difficult to believe that a federated search
> such
> as what you provide at MuseGlobal can do this sort of enterprise
> combination
> of data well.  The data is not well structured (except for catalog
> data) and
> includes an extreme range of completeness and little commonality.
What
> is
> interesting to note, however, is that on the one hand, a vendor such
as
> yourself may claim that you can do this sort of stuff well (I'm not
> saying
> you said that, just that you might say that). On the other hand I find
> it
> interesting to note that the enterprise search tool vendor we have,
> coming
> from a completely different market and perspective, would readily
claim
> they
> can do "all that library stuff" -- that they do in fact offer true
> federated
> search. Which in my personal opinion isn't true at all.
> 
> But ideally I would answer your question in this way. I think there
> should
> be a combination of the two approaches, that this would be more
> practical
> and workable than just one or the other.  How's that for sitting on
the
> fence :-)
> 
> 
> > Build vs. Buy? It obviously has taken Steve and his colleagues a lot
> of
> > hard work to produce a nice looking system (except for all those big
> > black bits on the screen!) and it obviously takes maintenance (it is
> > 'fragile') Do you think it was/is worth it and if so why?
> 
> 
> My answer is, it is too soon to tell.  There are many reasons why our
> implementation is probably unique (and I don't mean to imply that it
is
> better than someone else's, just that I doubt it could readily be
> replicated
> elsewhere).  We have a number of very different requirements and use
> cases
> than what some other library settings might have.  We have a large
> number of
> constraints on the IT side.  We have had to do a lot of custom stuff
as
> a
> result. This is probably why it is fragile, more than because of
> deficiencies in any one piece such as the search tool itself.
> 
> But we are still, in my view, only at the very early stages of
> assessing the
> whole package's value for our users.  And we have very particular,
> demanding
> users.
> 
> In sum, we have had to buy AND build and so it isn't, again, a
question
> 

Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-10 Thread Peter Noerr
Hi Steve and Renata,

First the declaration of interest: I am the CTO of a federated search
system company. However I am not trying to suggest you should use our
(or any) federated search system (so I will, coyly, not attach a
signature to this email).

I am interested in your comments on either or both of two questions:

Use an search engine and create an aggregated database/index of all the
material from the organization, or use a federated search system to
search the repositories/catalogs/databases/etc. in real time? Did you
consider both? And why the choice you made?

Build vs. Buy? It obviously has taken Steve and his colleagues a lot of
hard work to produce a nice looking system (except for all those big
black bits on the screen!) and it obviously takes maintenance (it is
'fragile') Do you think it was/is worth it and if so why?

Peter Noerr

> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Steve Oberg
> Sent: Thursday, July 10, 2008 8:21 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Enterprise Search and library collection
> [SEC=UNCLASSIFIED]
> 
> Renata and others,
> 
> After posting my original reply I realized how dumb it was to respond
> but
> say, sorry, can't tell you more.  As an aside, this is one of the
> things
> that irritates me the most about working in a for profit environment:
> the
> control exerted by MPOW over just about anything. But hey, this is the
> job
> situation I've consciously chosen so, I guess I shouldn't complain.
> 
> Although I can't name names and go into detail about our
> implementation, I
> have "anonymized" screenshots of various aspects of it and posted
> details
> about it at
> http://familymanlibrarian.com/2007/01/21/more-on-turning-the-catalog-
> inside-out/
> Keep in mind that my involvement has been focused on the catalog side.
> A
> lot of the behind-the-scenes work also dealt with matching subject
> terms in
> catalog records to the much simpler taxonomy chosen for our website.
> You
> can imagine that it can be quite complicated to set up a good rule set
> for
> matching LCSH or MeSH terms effectively to a more generic set of
> taxonomy
> terms and have those be meaningful to end users. We are continually
> evaluating and tweaking this setup.
> 
> As far as other general details, this implementation involved a lot of
> people, in fact a team of about 15, some more directly and exclusively
> and
> others peripherally.  In terms of maintenance, day to day maintenance
> is
> handled by about three FTE.  Our library catalog data is refreshed
once
> a
> day, as is the citation database to which I referred in the previous
> email,
> and content from our web content management environment.  A few other
> repositories are updated weekly because their content isn't as
> volatile.
> The whole planning and implementation process took a year and is still
> really working through implementation issues. For example we recently
> upgraded the version of our enterprise search tool to a newer version
> and
> this was a major change requiring a lot of resources and it took a lot
> more
> time to do than expected.
> 
> I hope this additional information is helpful.
> 
> Steve
> 
> On Tue, Jul 8, 2008 at 1:11 AM, Dyer, Renata
> <[EMAIL PROTECTED]>
> wrote:
> 
> > Our organisation is looking into getting an enterprise search and I
> was
> > wondering how many libraries out there have incorporated library
> > collection into a 'federated' search that would retrieve a whole
lot:
> > a library collection items, external sources (websites, databases),
> > internal documents (available on share drives and/or records
> systems),
> > maybe even records from other internal applications, etc.?
> >
> >
> > I would like to hear about your experience and what is good or bad
> about
> > it.
> >
> > Please reply on or offline whichever more convenient.
> >
> > I'll collate answers.
> >
> > Thanks,
> >
> > Renata Dyer
> > Systems Librarian
> > Information Services
> > The Treasury
> > Langton Crescent, Parkes ACT 2600 Australia
> > (p) 02 6263 2736
> > (f) 02 6263 2738
> > (e) [EMAIL PROTECTED]
> >
> > <https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49>
> >
> >
> >
> **
> > Please Note: The information contained in this e-mail message
> > and any attached files may be confidential information and
> > may also be the subject of legal professional privilege.  If you are
> > not the intended recipient, any use, disclosure or copying of this
> > e-mail is unauthorised.  If you have received this e-mail by error
> > please notify the sender immediately by reply e-mail and delete all
> > copies of this transmission together with any attachments.
> >
> **
> >


Re: [CODE4LIB] alpha characters used for field names

2008-06-25 Thread Peter Noerr
Eric, You might want to look at ISO 2709 - the standard giving the
format underlying MARC records. It allows alpha values for the tags. But
MARC does not allow alpha values. Wikipedia give a brief summary,
although the ISO standard is only 6 pages long.

Peter Noerr

> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Eric Lease Morgan
> Sent: Wednesday, June 25, 2008 12:21 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] alpha characters used for field names
> 
> Are alpha characters used for field names valid in MARC records?
> 
> When we do dumps of MARC records our ILS often dumps them with FMT and
> CAT field names. So not only do I have glorious 246 fields and 100
> fields but I also have CAT fields and FMT fields. Are these features
> of my ILS -- extensions of the standard -- or really a part of MARC?
> Moreover, does something like Marc4J or MARC::Batch and friends deal
> with these alpha field names correctly?
> 
> --
> Eric Lease Morgan