[CODE4LIB] NISO Open Discovery Initiative - survey invitation

2012-09-12 Thread Peter Noerr
(with apologies for cross-posting)
The Open Discovery Initiative (ODI), a working group of the National 
Information Standards Organization (NISO), has been formed to develop a 
Recommended Practice related to the index-based discovery services for 
libraries. ODI aims to investigate and improve the ecosystem surrounding these 
discovery services, with a goal of broader participation of content providers 
and increased transparency to libraries.

An important component of our work involves gathering information from the key 
stakeholders: libraries, content providers, and developers of discovery 
products.

If you are involved in discovery services we request that you respond to our 
survey. The survey results will provide essential information to the workgroup 
as it develops recommended practices related to discovery services. A full 
report on the findings of this survey will be made available publically on the 
NISO website later this year.

We are especially interested in input from:

•libraries that have implemented or plan to implement a discovery service 
and
•organizations that potentially contribute content to one or more of these 
services:
•
o   primary publishers,
o   producers of aggregated databases of citation or full-text content for 
libraries, and
o   creators of abstracting and indexing services.

We anticipate that the survey will take about 20 minutes to complete. 
https://www.surveymonkey.com/s/QBXZXSB

All respondents that identify themselves will be entered into a drawing for one 
of six $25 Amazon e-gift cards, kindly sponsored by Ex Libris and SAGE. These 
respondents will also receive a copy of the aggregated results. Note that any 
results shared will be anonymous and only aggregate data will be released.

In addition, if you are interested in keeping up to date with ODI, please sign 
up to our Interest mailing list - http://www.niso.org/lists/opendiscovery

Thank you
ODI Working Group



Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-27 Thread Peter Noerr
I agree entirely that these would need to be a collection of triples with its 
own set of attributes/metadata describing the collection. Basically a record 
with triples as the data elements.

But I see a bigger problem with the direction this thread has taken so far. The 
use of versions has been conditioned by the use of something like Github as the 
underlying versioning platform. But Github (and all software versioning 
systems) are based on temporal versions, where each version is, in some way, an 
evolved unit of the same underlying thing - a program or whatever. So the 
versions are really temporally linearly related to each other as well as 
related in terms of added or improved or fixed functionality. Yes, the codebase 
(the underlying thing) can fork or split in a number of ways, but they are 
all versions of the same thing, progressing through time.

In the existing bibliographic case we have many records which purport to be 
about the same thing, but contain different data values for the same elements. 
And these are the the versions we have to deal with, and eventually 
reconcile. They are not descendents of the same original, they are independent 
entities, whether they are recorded as singular MARC records or collections of 
LD triples. I would suggest that at all levels, from the triplet or key/value 
field pair to the triple collection or fielded record, what we have are 
alternates, not versions. 
 
Thus the alternates exist at the triple level, and also at the collection 
level (the normal bibliographic unit record we are familiar with). And those 
alternates could then be allowed versions which are the attempts to, in some 
way, improve the quality (your definition of what this is is as good as mine) 
over time. And with a closed group of alternates (of a single bib unit) these 
versioned alternates would (in a perfect world) iterate to a common descendent 
which had the same agreed, authorized set of triples. Of course this would only 
be the authorized form for those organizations which recognized the 
arrangement. 

But, allowing alternates and their versions does allow for a method of tracking 
the original problem of three organizations each copying each other endlessly 
to correct their data. In this model it would be an alternate/version spiral 
of states, rather than a flat circle of each changing version with no history, 
and no idea of which was master. (Try re-reading Stuart's (a), (b), (c) below 
with the idea of alternates as well as versions (of the Datasets). I think it 
would become clearer as to what was happening.) There is still no master, but 
at least the state changes can be properly tracked and checked by software 
(and/or humans) so the endless cycle can be addressed - probably by an outside 
(human) decision about the correct form of a triple to use for this bib 
entity.

Or this may all prove to be an unnecessary complication.

Peter


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 stuart yeates
 Sent: Monday, August 27, 2012 3:42 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google
 
 These have to be named graphs, or at least collections of triples which can 
 be processed through
 workflows as a single unit.
 
 In terms of LD there version needs to be defined in terms of:
 
 (a) synchronisation with the non-bibliographic real world (i.e. Dataset Z 
 version X was released at
 time Y)
 
 (b) correction/augmentation of other datasets (i.e Dataset F version G 
 contains triples augmenting
 Dataset H versions A, B, C and D)
 
 (c) mapping between datasets (i.e. Dataset I contains triples mapping between 
 Dataset J version K and
 Dataset L version M (and visa-versa))
 
 Note that a 'Dataset' here could be a bibliographic dataset (records of 
 works, etc), a classification
 dataset (a version of the Dewey Decimal Scheme, a version of the Māori 
 Subject Headings, a version of
 Dublin Core Scheme, etc), a dataset of real-world entities to do authority 
 control against (a dbpedia
 dump, an organisational structure in an institution, etc), or some arbitrary 
 mapping between some
 arbitrary combination of these.
 
 Most of these are going to be managed and generated using current systems 
 with processes that involve
 periodic dumps (or drops) of data (the dbpedia drops of wikipedia data are a 
 good model here). git
 makes little sense for this kind of data.
 
 github is most likely to be useful for smaller niche collaborative 
 collections (probably no more than
 a million triples) mapping between the larger collections, and scripts for 
 integrating the collections
 into a sane whole.
 
 cheers
 stuart
 
 On 28/08/12 08:36, Karen Coyle wrote:
  Ed, Corey -
 
  I also assumed that Ed wasn't suggesting that we literally use github
  as our platform, but I do want to remind folks how far we are from
  having people friendly versioning software -- at least, none that I

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Peter Noerr
We cried our eyes out in 1976 when this first came to our attention at the BL. 
Even more crying when we couldn't get rid of it in the MARC-I to MARC-II 
conversion (well before MARC21 was even a twinkle) - a lot of tears are 
gathering somewhere.

Peter



 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Bill 
 Dueber
 Sent: Tuesday, April 17, 2012 5:50 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 
 and MARC21
 
 On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc...@gmail.com wrote:
 
  Actually Anglo and Francophone centric. And the USMARC style 245 was a
  poor replacement for the UKMARC approach (someone at the British
  Library hosted Linked Data meeting wondered why there were punctation
  characters included in the data in the title field. The catalogers wept 
  slightly).
 
  Simon
 
 
 
 Slightly? I cry my eyes out *every single day* about that. Well, every 
 weekday, anyway.
 
 
 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library


Re: [CODE4LIB] presenting merged records?

2012-04-03 Thread Peter Noerr
 such as ranging mentioned above. Useful 
for dates (how to deal with c?) and not much else. Even so dates can have 
their own idiosyncrasies if dealing with articles, not monographs; how to 
handle some with dd's and some with mms and some with only years, or some 
combination. A vote for the most frequent would seem to give the best chance 
of a full publication date, but can go seriously wrong  - especially when 
confronted with dd/mm/yy and mm/dd/yy from different aggregators.
Other specialist processing deals with individual data elements such as journal 
titles and, further afield, phone numbers. Many of these processes can be used 
for virtual records for display, but generally are reserved for more thorough 
processing to normalize and sanitize the data in a later stage, which can 
involve recourse to outside authority tables, and sources, and a lot of other 
stuff.

And really it is for reasons of doing a lot of work, slowing down general 
processing and display, and producing sometimes iffy virtual records, that we 
have adopted the simplest display of choosing just one as an exemplar, hiding 
the rest in a sub-list to keep the display clean, and allowing the user to see 
for him/herself and make decisions. Having said that, we do display a count, 
and often a list of the sources of the records as part of the head, and leave 
it at that. Pragmatism beats perfection.

Peter


  lot of configuration options, which makes it quite flexible.
 
  So, you can choose:
 
  1. Which fields need to be identical for a record to be merged at all.
  I was using author, title, edition when I first mailed the list, but
  have found allowing records with different publication dates to be
  merged just caused too many unpredictable problems, and have now added
  publication date to the list of required fields. The test for
  identical authors, dates etc is just a string comparison, so a
  proportion of records which ought to be merged by these criteria never
  are, due to typos, variant names, date formats, etc.
 
  2. What to do with fields which differ between records which are being
  merged. You can choose either 'unique', which appends all unique field
  values (this is what I use for subject headings, so exactly repeated
  subject headings are dropped, but variants are kept), and 'longest',
  which picks the longest field value from all the candidates (this is
  what I use for abstracts).
 
  At the end of the process you have a merged record which has a 'head'
  with the merged record itself, but which contains each of the original
  records, so you could potentially do as you suggest and let users see
  any of the input records if they wanted. However, by default this
  isn't Marc but an internal format (a processed subset of the Marc
  input) so it may not be much use to most users.
 
  I'm finding the 'head' section is mostly quite usable but does often
  have individual fields with strange or repeated values (eg values
  identical apart from punctuation). So I'm doing some post-processing
  of my own on this, but it's very arbitrary at the moment.
 
  Graham
 
  On 03/30/12 01:09, Peter Noerr wrote:
  Hi Graham,
 
  What we do in our federated search system, and have been doing for some 
  few years, is basically
 give the designer a choice of what options the user gets for de-duped 
 records.
 
  Firstly de-duping can be of a number of levels of sophistication, and a 
  many of them lead to the
 situation you have - records which are similar rather than identical. On 
 the web search side of
 things there are a surprising number of real duplicates (well maybe not 
 surprising if you study more
 than one page of web search engine results), and on Twitter the duplicates 
 well outnumber the original
 posts (many thanks 're-tweet').
 
  Where we get duplicate records the usual options are: 1) keep the first 
  and just drop all the rest.
 2) keep the largest (assumed to have the most information) and drop the rest. 
 These work well for WSE
 results where they are all almost identical (the differences often are just 
 in the advertising
 attached to the pages and the results), but not for bibliographic records.
 
  Less draconian is 3) Mark all the duplicates and keep them in the list (so 
  you get 1, 2, 3, 4, 5,
 5.1, 5.2, 5.3, 6, ...). This groups all the similar records together under 
 the sort key of the first
 one, and does enable the user to easily skip them.
 
  More user friendly is 4) Mark all duplicates and hide them in a sub-list 
  attached to the head
 record. This gets them out of the main display, but allows the user who is 
 interested in that record
 to expand the list and see the variants. This could be of use to you.
 
  After that we planned to do what you are proposing and actually merge 
  record content into a single
 virtual record, and worked on algorithms to do it. But nobody was interested. 
 All our partners (who
 provide systems to lots of libraries, both public, academic, and special

Re: [CODE4LIB] presenting merged records?

2012-03-29 Thread Peter Noerr
Hi Graham,

What we do in our federated search system, and have been doing for some few 
years, is basically give the designer a choice of what options the user gets 
for de-duped records.

Firstly de-duping can be of a number of levels of sophistication, and a many of 
them lead to the situation you have - records which are similar rather than 
identical. On the web search side of things there are a surprising number of 
real duplicates (well maybe not surprising if you study more than one page of 
web search engine results), and on Twitter the duplicates well outnumber the 
original posts (many thanks 're-tweet').

Where we get duplicate records the usual options are: 1) keep the first and 
just drop all the rest. 2) keep the largest (assumed to have the most 
information) and drop the rest. These work well for WSE results where they are 
all almost identical (the differences often are just in the advertising 
attached to the pages and the results), but not for bibliographic records.

Less draconian is 3) Mark all the duplicates and keep them in the list (so you 
get 1, 2, 3, 4, 5, 5.1, 5.2, 5.3, 6, ...). This groups all the similar records 
together under the sort key of the first one, and does enable the user to 
easily skip them.

More user friendly is 4) Mark all duplicates and hide them in a sub-list 
attached to the head record. This gets them out of the main display, but 
allows the user who is interested in that record to expand the list and see 
the variants. This could be of use to you.

After that we planned to do what you are proposing and actually merge record 
content into a single virtual record, and worked on algorithms to do it. But 
nobody was interested. All our partners (who provide systems to lots of 
libraries, both public, academic, and special) decided that it would confuse 
their users more than it would help. I have my doubts, but they spoke and we 
put the development on ice.

I'm not sure this will help, but it has stood the test of time, and is well 
used in its various guises. Since no-one else seems interested in this topic, 
you could email me off list and we could discuss what we worked through in the 
way of algorithms, etc.

Peter


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of graham
 Sent: Wednesday, March 28, 2012 8:05 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] presenting merged records?
 
 Hi Michael
 
 On 03/27/12 11:50, Michael Hopwood wrote:
  Hi Graham, do I know you from RHUL?
 
 Yes indeed :-)
 
  My thoughts on merged records would be:
 
  1. don't do it - use separate IDs and just present links between related 
  manifestations; thus
 avoiding potential confusions.
 
 In my case, I can't avoid it as it's a specific requirement: I'm doing a 
 federated search across a
 large number of libraries, and if closely similar items aren't merged, the 
 results become excessively
 large and repetitive. I'm merging all the similar items, displaying a summary 
 of the merged
 bibliographic data, and providing links to each of the libraries with a copy. 
  So it's not really
 FRBRization in the normal sense, I just thought that FRBRization would lead 
 to similar problems, so
 that there might be some well-known discussion of the issues around... The 
 merger of the records does
 have advantages, especially if some libraries have very underpopulated 
 records (eg subject fields).
 
 Cheers
 Graham
 
 
  http://www.bic.org.uk/files/pdfs/identification-digibook.pdf
 
  possible relationships - see 
  http://www.editeur.org/ONIX/book/codelists/current.html - lists 51
 (manifestation)and 164 (work).
 
  2. c.f. the way Amazon displays rough and ready categories (paperback,
  hardback, audiobooks, *ahem* ebooks of some sort...)
 
  On dissection and reconstitution of records - there is a lot of talk going 
  on about RDFizing MaRC
 records and re-using in various ways, e.g.:
 
  http://www.slideshare.net/JenniferBowen/moving-library-metadata-toward
  -linked-data-opportunities-provided-by-the-extensible-catalog
 
  Cheers,
 
  Michael
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of graham
  Sent: 27 March 2012 11:06
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] presenting merged records?
 
  Hi
 
  There seems to be a general trend to presenting merged records to users, as 
  part of the move towards
 FRBRization. If records need merging this generally means they weren't 
 totally identical to start with,
 so you can end up with conflicting bibliographic data to display.
 
  Two examples I've come across with this: Summon can merge
  print/electronic versions of texts, so uses a new 'merged' material
  type of 'book/ebook' (it doesn't yet seem to have all the other
  possible permutations, eg book/audiobook). Pazpar2 (which I'm working
  with at the
  moment) has a merge option for publication dates which presents dates as a 
  period eg 

Re: [CODE4LIB] NON-MARC ILS?

2012-03-14 Thread Peter Noerr
There was a system developed back in the '80s which stored its records 
internally in a direct Entity-Relationship database and allowed inter-record 
linking and a rather hyperlink-like data structure. BUT... that was all 
internal. It allowed some very nice OPAC features and possibly easier 
cataloguing as authorities (basically subjects and authors) could be created 
once and linked into a bib record. Externally the system exchanged records in 
MARC. In fact in at least 15 different flavors of MARC. (It was built in Europe 
and was used to provide a service as a transformer for converting USMARC to 
various other MARC for European distribution.)

MARC was, and is an interchange format, so it is the format used to ship bib 
records between ILSs. It doesn't have to be used internally as the above system 
(which sold over 3,000 copies and has about 1,000 still active today, although 
it has been off the market for over 13 years) and InMagic and others show. In 
fact almost all the commercial systems do, as someone said previously, store 
the MARC records, not in ISO 2709 format, but shred them into some relational 
structure of tuples. But MARC is the language they all speak to each other. To 
change that would need an infrastructure, as also mentioned previously in this 
thread, to allow existing ILSs and repositories, based on MARC exchange, to 
interoperate with new ILSs, based on some other exchange. And that does mean 
hubs and repositories of transforming capabilities with very sophisticated 
semantics - and there really isn't any commercial case to create them. 

And all of this is a long way from what Matt's actual question was. 

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Bigwood, David
 Sent: Wednesday, March 14, 2012 12:49 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [*SP* 22%] Re: [CODE4LIB] NON-MARC ILS?
 
 Yes, there are non-MARC systems out there. I think InMagic has some.
 LibraryThing could be used and doesn't require MARC.  There are some home 
 inventory programs that
 might do for a small church library or such.
 
 But what is the problem with MARC? The structure is fairly compact, compared 
 to XLM for instance. It
 does lack some granularity I'd like to see, but that would only make it more 
 complex if flexible. It
 would also be nice if it were possible to do more linking from the record. 
 But this only increases the
 complexity and makes it more difficult to local catalogers. Personally, I 
 kind of like MODS, but I'm
 not sure how much it would save.
 
 Is the problem with the rules on how to fill the MARC record? That has mostly 
 to do with AACR. The
 bibliographic universe is complex and getting more so. The rules for 
 description and access must take
 that into account. It is true that the small public library won't need the 
 same detail as a special
 collection or research university. Maybe there could be a simplified/stripped 
 down AACR? Or maybe RDA,
 the new standard will have that basic option?
 
 Or is you problem with the fields, their order and associated punctuation? 
 That is ISBD or FRBR. Both
 are based on common sense and what we experience as the necessary elements 
 from our work. They are not
 based on research on what the user wants and does. However, that gets to the 
 question Who is the
 user? The elementary child writing a report on the Civil War or a grad 
 student writing their
 dissertation, the mechanic looking for a wiring diagram for a 69 Ford, or a 
 birdwatcher planning their
 trip, the person looking for do your own divorce? Maybe Google searches could 
 provide some answers but
 do people look for different things and search differently in the library and 
 on-line? Fertile ground
 for some theses.
 
 The other thing to consider is the huge number of records available in MARC 
 format. A small public
 library probably has very little original cataloging to do. Local high school 
 yearbooks, some self-
 published family histories. Doing things differently locally would mean all 
 the common stuff would
 have to be done in-house, not just down loaded.
 
 Sincerely,
 David Bigwood
 dbigw...@gmail.com
 Lunar and Planetary Institute
 Catalogablog: http://catalogablog.blogspot.com
 
 On Mar 14, 2012, at 8:59 AM, Matt Amory wrote:
 
  Is there a full-featured ILS that is not based on MARC records?
  I know we love complexity, but it seems to me that my public library
  and its library network and maybe even every public library could
  probably do without 95% of MARC Fields and encoding, streamline
  workflows and save $ if there were a simpler standard.
  Is this what an Endeca-based system is about, or do those rare birds
  also use MARC in the background?
  Forgive me if the question has been hashed and rehashed over the
 years...
 
  --
  Matt Amory
  (917) 771-4157
  matt.am...@gmail.com
  http://www.linkedin.com/pub/matt-amory/8/515/239


Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-02-25 Thread Peter Noerr
This post veers nearer to something I was going to add as an FYI, so here 
goes...

FYI: NISO has recently started a working group to study best practices for 
discovery services. The ODI (=Open Discovery Initiative) working group is 
hoping to look at exactly this issue (how should a content provider tell a 
content requestor what it can have) among others (how to convey commercial 
restrictions, how to produce statistics meaningful to providers, discovery 
services, and consumers of the discovery service), and hopefully produce 
guidelines on procedures and formats, etc. for this. 

This is a new working group and its timescale doesn't expect any deliverables 
until Q3 of 2012, so it is a bit late to help Owen, but anyone who is 
interested in this may want to follow, from time to time, the NISO progress. 
Look at www.niso.org and find the ODI working group. If you're really 
interested contact the group to offer thoughts. And many of you may be 
contacted by a survey to find out your thoughts as part of the process, anyway. 
Just like the long reach of OCLC, there is no escaping NISO.

Peter   

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
 Hourcle
 Sent: Friday, February 24, 2012 10:20 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
 
 On Feb 24, 2012, at 9:25 AM, Kyle Banerjee wrote:
 
 
  One of the questions this raises is what we are/aren't allowed to do
  in terms of harvesting full-text. While I realise we could get into
  legal stuff here, at the moment we want to put that question to one
  side. Instead we want to consider what Google, and other search
  engines, do, the mechanisms available to control this, and what we
  do, and the equivalent mechanisms - our starting point is that we
  don't feel we should be at a disadvantage to a web search engine in
  our harvesting and use of repository records.
 
  Of course, Google and other crawlers can crawl the bits of the
  repository that are on the open web, and 'good' crawlers will obey
  the contents of robots.txt We use OAI-PMH, and while we often see
  (usually general and sometimes
  contradictory) statements about what we can/can't do with the
  contents of a repository (or a specific record), it feels like there
  isn't a nice simple mechanism for a repository to say don't harvest this 
  bit.
 
 
  I would argue there is -- the whole point of OAI-PMH is to make stuff
  available for harvesting. If someone goes to the trouble of making
  things available via a protocol that exists only to make things
  harvestable and then doesn't want it harvested, you can dismiss them
  as being totally mental.
 
 I see it like the people who request that their pages not be cached elsewhere 
 -- they want to make
 their object 'discoverable', but they want to control the access to those 
 objects -- so it's one thing
 for a search engine to get a copy, but they don't want that search engine 
 being an agent to distribute
 copies to others.
 
 Eg, all of the journal publishers who charge access fees -- they want people 
 to find that they have a
 copy of that article that you're interested in ... but they want to collect 
 their $35 for you to read
 it.
 
 In the case of scientific data, the problem is that to make stuff 
 discoverable, we often have to
 perform some lossy transformation to fit some metadata standard, and those 
 standards rarely have
 mechanisms for describing error (accuracy, precision, etc.).  You can do some 
 science with the catalog
 records, but it's going to introduce some bias into your results, so you're 
 typically better of
 getting the data from the archive.  (and sometimes, they have nice clean 
 catalogs in FITS, VOTable,
 CDF, NetCDF, HDF or whatever their discipline's preferred data format is)
 
 ...
 
 Also, I don't know if things have changed in the last year, but I seem to 
 remember someone mentioning
 at last year's RDAP (Research Data Access  Preservation) summit that Google 
 had coordinated with some
 libraries for feeds from their catalogs, but was only interested in books, 
 not other objects.
 
 I don't know how other search engines might use data from OAI-PMH, or if 
 they'd filter it because they
 didn't consider it to be information they cared about.
 
 -Joe


Re: [CODE4LIB] My crazed idea about dealing with registration limitations

2011-12-22 Thread Peter Noerr
Crazy variation number 3. Have two tracks which are identical, but time shifted 
by half a day (or some other convenient unit). The presenters talk twice on the 
same day - in the morning for track A and the afternoon for track B. That way 
there is no speaker gulag, no time over-run (though, following Declan's 
point, how much time is left out of the week after travelling, so why not the 
whole week), and you get a chance to hear a really interesting presentation 
twice - or miss it twice! Yes the interactions would be different (I would hope 
so), but that may be an advantage. Questions can be asked that got the time 
chop previously, more details can be added the second time round, attendees 
have more to compare over lunch/beer. The problem would be a heard following 
one presentation so we have 500 in one and only 3 in the other. Room size 
limits (enforced) could help relieve that, or labeling people to their track 
and only allowing/encouraging mixing at intermediate events.

And streaming to a satellite meeting, say here in the Bay, area where 
10-15-20 people could get together informally gives them a chance to interact 
amongst themselves, if not the whole group. (OK, that is crazy idea #4

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Corey 
 A Harper
 Sent: Thursday, December 22, 2011 8:44 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] My crazed idea about dealing with registration 
 limitations
 
 Cary,
 
 Good to know about your extensive experience w/ streaming.
 
 If you'll be in Seattle, would you be willing to add your name to the Video 
 Committee listing?
 http://wiki.code4lib.org/index.php/2012_committees_sign-up_page#Video_Committee
 
 Having people who actually know what they're doing involved in this effort 
 *this* year will help
 ensure that we're actually able to pull it off as effectively as IU did...
 
 Thanks,
 -Corey
 
 
 On Thu, Dec 22, 2011 at 10:42 AM, Cary Gordon listu...@chillco.com wrote:
  This is definitely doable, and potentially effective for a single
  track conference.
 
  I have been doing streaming as a volunteer for eight years and it
  keeps getting easier.
 
  Cary
 
  On Thu, Dec 22, 2011 at 7:33 AM, Wilfred Drew dr...@tc3.edu wrote:
  Here is another crazy idea; stream the event live for those who can't get 
  registered for the pace
 to face version and provide a lower registration fee for them.
 
 
  -
  Wilfred (Bill) Drew, M.S., B.S., A.S.
  Assistant Professor
  Librarian, Systems and Tech Services/Electronic Resources/Serials
  Tompkins Cortland Community College  (TC3) Library:
  http://www.tc3.edu/library/
  Dryden, N.Y. 13053-0139
  Follow the library: http://twitter.com/TC3Library
  E-mail: dr...@tc3.edu
  Phone: 607-844-8222 ext.4406
  SKYPE/Twitter:BillDrew4
  SMS/TXT Me: 6072182217
  Website: http://BillTheLibrarian.com
  StrengthsQuest Strengths: Ideation, Input, Learner, Command,
  Analytical http://www.facebook.com/billdrew One thing about eBooks
  that most people haven't thought much is that eBooks are the very
  first thing that we're all able to have as much as we want other than 
  air. -- Michael Hart,
 Project Gutenberg PPlease consider the environment before printing this 
 e-mail or document.
 
 
 
  --
  Cary Gordon
  The Cherry Hill Company
  http://chillco.com
 
 
 
 --
 Corey A Harper
 Metadata Services Librarian
 New York University Libraries
 20 Cooper Square, 3rd Floor
 New York, NY 10003-7112
 212.998.2479
 corey.har...@nyu.edu


Re: [CODE4LIB] Obvious answer to registration limitations

2011-12-22 Thread Peter Noerr
+1

Peter Noerr
MuseGlobal

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Schneider
 Sent: Thursday, December 22, 2011 11:11 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Obvious answer to registration limitations
 
 
  Also, is there any interest in a San Francisco Bay Area Code For
  Libraries Regional Affiliate (code4lib-sfbay for short)?
 
 
 +1
 
 If our bandwidth issues on campus get resolved, we'd offer our site, too.
 Our Valley Center for Performing Arts has a smaller theater on the lower 
 level that could work.
 Exploratory site visits welcome.
 
 Karen G. Schneider
 Holy Names University


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
Being no longer in Europe, I had completely missed the currently hot potato 
definition of EMU. But it had a nice feel to it sigh

I agree with Karen below that a record seems more bounded and static, whereas a 
description varies according to need. And that is the distinction I was trying 
to get at: that the item stored in some database is everything unique about 
that entity - and is static, until some data actually changes, whereas the 
description is built at run time for the user and may contain some data from 
the item record, and some aggregated from other, linked, item records. The 
records all have long term existence in databases and the like, whereas the 
description is a view of all that stored data appropriate for the moment. It 
will only be stored as a processing intermediate result (as a record, since its 
contents are now fixed), and not long term, since it would be broken up to bits 
of entity data and stored in a distributed linked fashion (much like, as I 
understand it, the BL did when reading MARC records and storing them as entity 
updates.)

Having said all that, I don't like the term description as it carries a lot 
of baggage, as do all the other terms. But I'm stuck for another one.

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coyle
 Sent: Tuesday, December 13, 2011 12:23 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 Quoting Simon Spero s...@unc.edu:
 
  On Tue, Dec 13, 2011 at 8:58 AM, Richard Wallis
  richard.wal...@talis.comwrote:
 
 
  However, I think you are thinking in the right direction - I am
  resigning myself to just using the word 'description'.
 
 
  Q: In your definition, can *descriptions *be put* * into 1:1 correspondence
  with records (where a record is a atomic asserted set of propositions about
  a resource)?
 
 Yes, I realize that you were asking Richard, but I'm a bit forward, as
 we know. I do NOT see a description as atomic in the sense that a
 record is atomic. A record has rigid walls, a description has
 permeable ones. A description always has the POTENTIAL to have a bit
 of unexpected data added; a record cuts off that possibility.
 
 That said, I am curious about the permeability of the edges of a named
 graph. I don't know their degree of rigidity in terms of properties
 allowed.
 
 kc
 
 
  Simon
 
 
 
 
 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-13 Thread Peter Noerr
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Richard Wallis
 Sent: Tuesday, December 13, 2011 3:16 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 On 13 December 2011 22:17, Peter Noerr pno...@museglobal.com wrote:
 
  I agree with Karen below that a record seems more bounded and static,
  whereas a description varies according to need. And that is the
  distinction I was trying to get at: that the item stored in some
  database is everything unique about that entity - and is static, until
  some data actually changes, whereas the description is built at run
  time for the user and may contain some data from the item record, and
  some aggregated from other, linked, item records. The records all have
  long term existence in databases and the like, whereas the description
  is a view of all that stored data appropriate for the moment. It will
  only be stored as a processing intermediate result (as a record, since
  its contents are now fixed), and not long term, since it would be
  broken up to bits of entity data and stored in a distributed linked
  fashion (much like, as I understand it, the BL did when reading MARC
  records and storing them as entity updates.)
 
 
 Yes.  However those descriptions have the potential to be as permanent as the 
 records that they were
 derived from.  As in the BL's case where the RDF is stored, published and 
 queried in [Talis]
 Kasabi.com:
 http://kasabi.com/dataset/british-national-bibliography-bnb


I would argue that they are stored permanently as multiple records holding the 
data about each of the individual entities derived from the original single 
MARC record. In my mind (for this discussion) anything that is stored is a 
record. It may be a single agglutinative record such as MARC, or the same data 
may be split amongst records for the work, the author, the subjects, the 
physical instance, the referenced people, etc. But the data for each of those 
is stored in a record unique to that entity (or in records for other entities 
linked to that entity), so the whole data set of attributes get spread around 
as fields in various records about various entities - and the links between 
them, let us not forget the very real importance of the links for carrying 
data. 

When a user wants to view the information about this title, then a description 
is assembled from all the stored records and presented to the user. It is, 
almost by definition (as I am viewing this), an ephemeral view (a virtual 
record - one which is not stored complete anywhere) for this user. If the user 
stores this record in a store using the same mechanisms and data model, then 
the constituent data values will be dispersed to their entity records again. 
(If the user wants to process the record, then it may well be stored as a 
whole, since it contains all the information needed for whatever the current 
task is, and the processed record may be discarded or stored permanently again 
in a linked data net as data values in various entity records within that 
model. Or it may be stored whole in an old fashioned record oriented 
database.)

 
 
 
  Having said all that, I don't like the term description as it
  carries a lot of baggage, as do all the other terms. But I'm stuck for 
  another one.
 
 
 Me too.  I'm still searching searching for a budget airline term - no baggage!

How about something based on South West - where bags fly free! Though I can't 
make any sort of acronym starting with SW!
 
 ~Richard.
 
 --
 Richard Wallis
 Technology Evangelist, Talis
 http://consulting.talis.com
 Tel: +44 (0)7767 886 005
 
 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw
 IM: rjw3...@hotmail.com


Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Peter Noerr
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coyle
 Sent: Sunday, December 11, 2011 3:47 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
 
 Quoting Richard Wallis richard.wal...@talis.com:
 
 
  You get the impression that the BL chose a subset of their current
  bibliographic data to expose as LD - it was kind of the other way around.
  Having modeled the 'things' in the British National Bibliography
  domain (plus those in related domain vocabularis such as VIAF, LCSH,
  Geonames, Bio, etc.), they then looked at the information held in
  their [Marc] bib records to identify what could be extracted to populate it.
 
 Richard, I've been thinking of something along these lines myself, especially 
 as I see the number of
 translating X to RDF projects go on. I begin to wonder what there is in 
 library data that is
 *unique*, and my conclusion is: not much. Books, people, places, topics: they 
 all exist independently
 of libraries, and libraries cannot take the credit for creating any of them. 
 So we should be able to
 say quite a bit about the resources in libraries using shared data points -- 
 and by that I mean, data
 points that are also used by others. So once you decide on a model (as BL 
 did), then it is a matter of
 looking *outward* for the data to re-use.

Trying to synthesize what Karen, Richard and Simon have bombarded us with here, 
leads me to conclude that linking to existing (or to be created) external data 
(ontologies and representations) is a matter of: being sure what you’re the 
system's current user's context is, and being able to modify the external data 
brought into the users virtual EMU(see below *** before reading further). I 
think Simon is right that records will increasingly become virtual in that 
they are composed as needed by this user for this purpose at this time. We 
already see this in practice in many uses from adding cover art to book MARC 
records to just adding summary information to a management level report. 
Being able to link from a book record to foaf:person and a bib:person records 
and extract data elements from each as they are needed right now should not be 
too difficult. As well as a knowledge of the current need, it requires a 
semantically based mapping of the different elements of those!
  people representations. The neat part is that the total representation for 
that person may be expressed through both foaf: and bib: facets from a single 
EMU which contains all things known about that person, and so our two requests 
for linked data may, in fact should, be mining the same resource, which will 
translate the data to the format we ask for each time, and then we will combine 
those representations back to a collapsed single data set.

I think Simon (maybe Richard, maybe all of you) was working towards a single 
unique EMU for the entity which holds all unique information about it for a 
number of different uses/scenarios/facets/formats. Of course deciding on what 
is unique and what is obtained from some more granular breakdown is another 
issue. (Some experience with this onion skin modeling lies deep in my past, 
and may need dredging up.)

It is also important, IMHO, to think about the repository from of entity data 
(the EMU) and the transmission form (the data sent to a requesting system when 
it asks for foaf:person data). They are different and have different 
requirements. If you are going to allow all these entity data elements to be 
viewed through a format filter then we have a mixed model, but basically a 
whole-part between the EMU and the transmission form. (e.g. the full data set 
contains the person's current address, but the transmitted response sends only 
the city). Argue amongst yourselves about whether an address is a separate 
entity and is linked to or not - it makes a simple example to consider it as 
part of the EMU.

All of this requires that we think of the web of data as being composed not of 
static entities with a description which is fixed at any snapshot in time, but 
being dynamic in that what two users see of the same entity maybe different at 
exactly the same instant. So not only a descriptive model structure, but also a 
set of semantic mappings, a context resolution transformation, and the system 
to implement it each time a link to related data is followed.

 
 I maintain, however, as per my LITA Forum talk [1] that the subject headings 
 (without talking about
 quality thereof) and classification designations that libraries provide are 
 an added value, and we
 should do more to make them useful for discovery.
 
 
 
  I know it is only semantics (no pun intended), but we need to stop
  using the word 'record' when talking about the future description of 
  'things' or
  entities that are then linked together.   That word has so many built in
  assumptions, especially in the library world.
 

Re: [CODE4LIB] Models of MARC in RDF

2011-12-05 Thread Peter Noerr
See historical comment in text below. But, to look forward -

It seems to me that we should be able to design a model with graceful 
degradation from full MARC data element set (vocabulary if you insist) to a 
core set which allows systems to fill in what they have and, on the receiving 
end, extract what they can find. Each system can work with its own schema, if 
it must, as long as the mapping for its level of detail against whatever 
designated level of detail it wishes to accept in the exchange format is 
created first. Obviously greater levels of detail cannot be inferred from 
lesser, and so many systems would be working with less than the data they would 
like, or create locally, but that is the nature of bibliographic data - it is 
never complete, or it must be processed assuming that is the case.

Using RDF and entity modeling it should be possible to devise a (small) number 
of levels from a basic core set (akin to DC, if not semantically identical) 
through to a 2,500 attribute* person authority record (plus the other bib 
entities), and produce pre-parsers which will massage these to what the ILS (or 
other repository/system) is comfortable with. Since the receiving system is 
fixed for any one installation it does not need the complexity we build into 
our fed search platforms, and converters would be largely re-usable.

So, what about a Russian doll bibliographic schema? (Who gets to decide on what 
goes in which level is for years of committee work - unemployment solved!)


* number obtained from a line count from 
http://www.loc.gov/marc/authority/ecadlist.html - so rather approximate.

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Jonathan Rochkind
 Sent: Monday, December 05, 2011 10:57 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Models of MARC in RDF
 
 On 12/5/2011 1:40 PM, Karen Coyle wrote:
 
  This brings up another point that I haven't fully grokked yet: the use
  of MARC kept library data consistent across the many thousands of
  libraries that had MARC-based systems.
 
 Well, only somewhat consistent, but, yeah.
 
  What happens if we move to RDF without a standard? Can we rely on
  linking to provide interoperability without that rigid consistency of
  data models?
 
 Definitely not. I think this is a real issue.  There is no magic to linking 
 or RDF that provides
 interoperability for free; it's all about
 the vocabularies/schemata -- whether in MARC or in anything else.
 (Note different national/regional  library communities used different 
 schemata in MARC, which made
 interoperability infeasible there. Some still do, although gradually people 
 have moved to Marc21
 precisely for this reason, even when Marc21 was less powerful than the MARC 
 variant they started with).

Just a comment about the good old days when we had to work with USMARC, 
UKMARC, DANMARC, MAB1, AUSMARC, and so on. interoperability infeasible was 
not the situation. It was perfectly possible to convert records from one format 
to another - with some loss of data into the less specific format of course. 
Which meant that a round trip was not possible. But major elements were 
present in all and that meant it was practically useful to do it. We did this 
at the British Library when I was there, and we did it commercially as a 
service for OCLC (remember them?) as a commercial ILS vendor. It did involve 
specific coding, and an internal database system built to accommodate the 
variability. 

 
 That is to say, if we just used MARC's own implicit vocabularies, but output 
 them as RDF, sure, we'd
 still have consistency, although we
 wouldn't really _gain_ much.On the other hand, if we switch to a new
 better vocabulary -- we've got to actually switch to a new better vocabulary. 
  If it's just whatever
 anyone wants to use, we've made it VERY difficult to share data, which is 
 something pretty darn
 important to us.
 
 Of course, the goal of the RDA process (or one of em) was to create a new 
 schema for us to
 consistently use. That's the library community effort to maintain a common 
 schema that is more
 powerful and flexible than MARC.  If people are using other things instead, 
 apparently that failed, or
 at least has not yet succeeded.


Re: [CODE4LIB] Examples of visual searching or browsing

2011-10-28 Thread Peter Noerr
This looks really colorful, but how does it aid searching, or browsing?

The pie chart is useful for a collections development librarian to see how the 
collection is distributed across broad subject areas.

How does it help me, a user, searching for books on Dentistry (yes they are 
there, all 9443 of them) to know that the biggest collections are in Asian 
history and languages (and books). What functionality does the visualization 
add to the list of topics given below? It's organized by call number (starting 
at 3 o'clock?), so I don't even have alphabetic headings to help. And the 198 
general works, and 375 dictionaries just disappear. 

It looks nice, but exactly what searching purpose does it enhance - either by 
its existence, or over the alternative list display (boring, but complete)?


Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Julia 
 Bauder
 Sent: Friday, October 28, 2011 9:55 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Examples of visual searching or browsing
 
 This is all fabulous, thank you! MapFast and the HathiTrust visualizations 
 are exactly the kinds of
 things I was looking for, and the tree-mapping idea also sounds like a very 
 good one for visualizing
 collections.
 
 Thanks!
 
 On Fri, Oct 28, 2011 at 11:11 AM, Margaret Anderson ande...@tc3.edu wrote:
 
  Take a look at a visualization of HathiTrust works by call number
 
  http://www.hathitrust.org/visualizations_callnumbers
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
  Of Julia Bauder
  Sent: Thursday, October 27, 2011 4:27 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Examples of visual searching or browsing
 
  Dear fans of cool Web-ness,
 
  I'm looking for examples of projects that use visual(=largely non-text
  and
  non-numeric) interfaces to let patrons browse/search collections.
  Things like the GeoSearch on North Carolina Maps[1], or projects that
  use Simile's Timeline or Exhibit widgets[2] to provide access to
  collections (e.g., what's described here:
  https://letterpress.uchicago.edu/index.php/jdhcs/article/download/59/7
  0), or in-the-wild uses of Recollection[3]. I'm less interested in
  knowing about tools (although I'm never *uninterested* in finding out
  about cool tools) than about production or close-to-production sites
  that are making good use of these or similar tools to provide visual,
  non-linear access to collections. Who's doing slick stuff in this area
  that deserves a look?
 
  Thanks!
 
  Julia
 
  [1] http://dc.lib.unc.edu/ncmaps/search.php
  [2] http://www.simile-widgets.org/
  [3] http://recollection.zepheira.com/
 
 
 
 
  *
 
  Julia Bauder
 
  Data Services Librarian
 
  Interim Director of the Data Analysis and Social Inquiry Lab (DASIL)
 
  Grinnell College Libraries
 
   Sixth Ave.
 
  Grinnell, IA 50112
 
 
 
  641-269-4431
 


Re: [CODE4LIB] JHU integration of PD works

2011-06-16 Thread Peter Noerr
Numerous comments from today's posts.

As to Jonathan on complexity, resources and we've got it working. Indeed you 
have and it is a cool looking UI and functionality behind it. I did not mean to 
imply that you had not got a working system, or that anyone else could not do 
it (see Dan's comments on his available JavaScript). The important word in what 
I wrote was scale. Many of the content providers, ILS vendors, and even 
enterprise software producers who have come to use Muse as a third party 
middleware platform have done so after they have started on this type of work. 
(I include federated search, known item enrichment, record harvesting, record 
conversion and the like as the same type of work here.) They got it written, 
running and started expanding. Then they came to see us once the number of 
Sources reached about 2 dozen. At this point they found that the work of 
maintenance (Karen's concern) was actually starting to take whole numbers of 
FTE programmers, and it was becoming costly. The projections to hu!
 ndreds of Sources were scary financially, yet there was/is demand for the 
capability. So they came to us to take advantages of economies of scale where 
we can build and fix...and fix...and fix... just once for all our partners. 
That way it works. It also works on a small scale with well known, well defined 
Sources. (More horror stories can wait for the pub.)

Integration with ILS's (we have our system integrated with two, two more were - 
but have gone their own way, and are developing with a fifth one, so some 
experience): Generally this is technically challenging because the ILS 
catalogue is display software which is built to handle results from a single 
source - the search engine of the catalogue. Thus integrating other results 
into the data stream is just not possible, without development. So you have 
to go about it some other way.
 
First Possibility is to layer the extra functionality on top of the 
OPAC. Then this code becomes the OPAC and a lot of work has to be done to   
replicate the OPAC functions correctly down to the underlying ILS. And some of 
the ILS vendors have a hissy fit about this - just saying. 

Second Possibility is to do what Dan has done and make the extra 
functionality a display level action. In other words create a Browser based
client which does what you want in terms of aggregating records. Again, our 
experience has been that this does not make ILS vendors feel all warm   and 
cuddly, but there is not a lot they can do about it - they do not own the users 
browsers. A version of this approach is what Umlaut does (asI 
understand it - which could be very wrong :-( ) where the additional 
functionality is sever based, but is an adjunct to the main OPAC.

Third possibility is to go right to the backend and put the integration 
between the ILS and the search engine for the OPAC. Thus the OPAC talks 
to the federator, and it queries the OPAC database and any other Sources, 
presenting all the results as one stream, as if from the ILS database. 

Surprisingly (maybe - it was to me a long while ago) the easiest way to do this 
- with tech help - is the third. And it seems to be the one which gives the ILS 
vendors the least qualms. (Some caveats there - so see the reply to Karen's 
concerns below.) With most ILSs running as client-server architectures (as far 
as their DB/search engine are concerned), there is a natural break point to 
make use of. But this is not just a bit of universally applicable JavaScript - 
it is unique to each ILS, and only makes sense in a dedicated installation with 
the technical resources to implement and maintain it, or in a situation like 
ours, where we can make use of that one integration to add access to thousands 
of extra sources, consequently fitting all (so far) user requirements.

Karen's point about approaching Vendors and their concern about stability of 
Source(s). (Well, a lot of comment from others as well, but she started it.) 
This is a concern and, as I said above, one of the reasons why vendors work 
with us. We can guarantee a stable API for the vendor ILS whatever the vagaries 
of the actual Source. And I think that would be vital. The ILS vendors are not 
interested in crafting lots of API or parsing code and fixing it continuously. 
So OL would have to guarantee a stable API, which met minimum functionality 
requirements, and keep it running for at least a dozen years. We are still 
running the first API we produced some 10 years ago as there are deployed 
systems out there which use it and they are not going to be replaced any time 
soon. The users lose out on new functionality (a lot of it!), but cannot or 
will not pay for the upgraded ILS. A subsidiary advantage of this stable third 
party supplier scenario is that Karen's last query (graceful!
  degradation?) is our problem, not the ILS's. We have to handle that and 
notify the ILS, and it just 

Re: [CODE4LIB] JHU integration of PD works

2011-06-15 Thread Peter Noerr
I would just like to confirm from years of practical experience that Jonathan 
is right - this is hard technically. Not in principle, but the devil is in the 
details and they are all different, and often change. The very neat addition to 
the JHU catalog that Eric reported on that started this thread 
(https://catalyst.library.jhu.edu/catalog/bib_816990) is an example of what we 
call secondary searching and/or enrichment. 

And it is available - in our commercial software (not a plug - we don't sell 
it, just noting that it is not the sort of thing to try yourself on any scale - 
it takes a lot of resources). Our software is incorporated in the offerings of 
a number of the ILS and content vendors. Admittedly almost exclusively for 
federated searching, but the problems are the same. And Jonathan enumerates 
them pretty well below. So, to answer Karen's question, it can be done if the 
ILS vendors make the functionality available, and the libraries configure it.

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
 Jonathan Rochkind
 Sent: Wednesday, June 15, 2011 10:34 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] JHU integration of PD works
 
 On 6/15/2011 10:55 AM, Karen Coyle wrote:
 
  I've been struggling with this around the Open Library digital texts:
  how can we make them available to libraries through their catalogs?
  When I look at the install documentation for Umlaut [1](I was actually
  hoping to find a technical requirements list), it's obvious that it
  takes developer chops.
 
 This isn't neccesarily un-fixable. I have plans to make it easier --
 it's totally possible to make it easier (largely because Rails, on which
 Umlaut is based, has gotten so much better at being easier to
 install/deploy things and have em Just Work), I just need to find time
 (that I'm having trouble finding) to make the changes.
 
 Eric, as well as Karen,  also asked why no vendors seem interested in
 supplying a product like this -- may be a bit of a chicken and an egg,
 there may not be a market for it -- I have trouble explaining to people
 why Umlaut is actually really cool in the first place, even other
 libraries. Although these conversations help me learn new ways to
 talk/think about it.
 
 So, I can definitely make Umlaut easier to install and run -- but there
 are still going to be some technical craziness, involved with dealing
 with your local metadata in all it's local idiosyncracies, and dealing
 with matching it to 'remote' data in a way that meets local use cases.
 Like I said before, this is inherently imperfect, but that means that
 there are a bunch of choices to make about what imperfect trade-offs you
 want to make, and these inevitably have to do with the nature of your
 local (mostly cataloging) metadata, and the use cases you are supporting.
 
 Really, I'm not sure I have faith in our existing vendors to be able to
 do a good job with it -- this is a really complicated thing that Umlaut
 is trying to do, in the end. (from my experience; it didn't sound that
 complicated at first, but it ends up so. Trouble-shooting problems ends
 up being incredibly complex, because there are so many different systems
 involved, and a bug or bad metadata on any one can mess things up).
 
 So I guess what I'm saying is, if you're talking about Umlaut's approach
 -- it is a technically hard problem in our existing environment.
 (existing environment means our really bad local cataloging metadata,
 our multiple silo's of local metadata, and our pretty awful 'link
 resolver' products with poor API's, etc -- also the third party content
 host's poor metadata, lack of API's, etc.  None of these things are
 changing anytime soon). So if you're talking about this approach in
 particular, when Erik asks is it technical or is political -- my
 experience with Umlaut definitely definitely says 'technical', not
 'political'. I've gotten no opposition to what Umlaut's trying to do,
 once people understand it, only dissatisfaction with how well it does it
 (a technical issue).
 
 Jonathan


Re: [CODE4LIB] Group-sourced Google custom search site?

2011-05-11 Thread Peter Noerr
Just curious: - what do you mean by  Some way to avoid the site-scrapers who 
populate the troubleshooting
 pages. (last sentence below)?

I presume you are wishing to avoid the trouble shooting sites which consist 
of nothing more than pages copied from other sites, and look only at the prime 
source pages for information?

Peter

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cindy 
 Harper
 Sent: Monday, May 02, 2011 2:15 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Group-sourced Google custom search site?
 
 That reminds me - I was looking last week into the possibility of making a
 Google custom search site with either a whitelist of trusted technology
 sites, or a blacklist of sites to exclude.  I haven't looked into whether
 the management of that could be group-sourced, but maybe someone else here
 has thought about this.  I haven't looked into the terms of service of
 custom search sites, either.  But of course slashdot was high on the
 whitelist.  I was thinking about sites for several purposes - general
 technology news and opinion, or specific troubleshooting / programming
 sites.  Some way to avoid the site-scrapers who populate the troubleshooting
 pages.
 
 
 Cindy Harper, Colgate U.


Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Since we generally return results asynchronously to client systems from our 
MSSE (fed/meta/broadcast/aggregated/parallel/Multi-Server/Search Engine) I 
would just point out that we use other protocols than SRU when doing so. When 
we do use SRU on the client side, then we send back the results in a complete 
set. Otherwise we send them in tranches on a timescale controlled by the client 
system, usually about every 2 seconds.

Obviously an SRU-async protocol is possible, but would it be used? As a MSSE we 
would use it to get results from Sources, so they could be processed earlier 
(smaller response time) and more smoothly. But that would require Source 
servers implemented it, and what would their incentive be to implement it? 

For direct use with end users it would mean a browser client capable of 
retrieving and managing the partial data is needed. Middleware systems (between 
the MSSE and the user) would need to support it, and pass the benefit to the 
user. Any system doing heavy analysis of the results would probably not want 
(and may not be able) to start than analysis until all the results are 
obtained, because of the added messiness of handling partial results sets, from 
multiple Sources (it is messy - believe me). 

I would be very happy to see such a protocol (and have it implemented), and if 
Jakub implemented browser code to handle that end, then the users could benefit.

Peter

Peter Noerr
CTO. MuseGlobal
www.museglobal.com

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jakub Skoczen
 Sent: Tuesday, May 18, 2010 12:51 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
 drafts
 
 On Tue, May 18, 2010 at 9:17 PM, Ray Denenberg, Library of Congress
 r...@loc.gov wrote:
  First, no. There are extensibility features in SRU but nothing that
 would
  help here.
 
  Actually, Jonathan, what I though you were suggesting was the
 creation of a
  (I hesitate to say it) metasearch engine. I use that term because it
 is what
  NISO called it, when they started their metasearch initiative five or
 so
  years ago, to create a standard for a metasearch engine, but they got
  distracted and the effort really came to nothing.
 
 I'm not sure if Jonathan was suggesting that but that's exactly what I
 had in mind - using SRU 2.0 as a front-end protocol for a meta-search
 engine. And yes while creating a third-party, SRU-inspired protocol
 for that purpose could work, I see very little value in such exercise.
 I suspect that, as any standard, SRU has certain limitations and, as
 an implementer, you have to work around them but you do end up with an
 obvious gain: standards compliance. SRU-inspired protocol is not quite
 the same thing, and it's probably easier to go all the way and create
 a custom, proprietary protocol.
 
  The premise of the metasearch engine is that there exists a single-
 thread
  protocol, for example, SRU, and the need is to manage many threads,
 which is
  what the metasearch engine would have done if it had ever been
 defined. This
  is probably not an area for OASIS work, but if someone wanted to
 revive the
  effort in NISO (and put it on the right track) it could be useful.
 
  --Ray
 
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
 Of
  Jonathan Rochkind
  Sent: Tuesday, May 18, 2010 2:56 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
 drafts
 
  Jakub Skoczen wrote:
 
  I wonder if someone, like Kuba, could design an 'extended async
 SRU'
  on top of SRU, that is very SRU like, but builds on top of it to
 add
  just enough operations for Kuba's use case area.  I think that's
 the
  right way to approach it.
 
 
  Is there a particular extensibility feature in the protocol that
  allows for this?
 
  I don't know, but that's not what I was suggesting. I was suggesting
 you
  read the SRU spec, and then design your own SRU-async spec, which
 is
  defined as exactly like SRU 2.0, except it also has the following
  operations, and is identified in an Explain document like X.
 
  Jonathan
 
 
 
 
 --
 
 Cheers,
 Jakub


Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Aha, but we get interleaved results from the different Sources. So the results 
are not all A, all B, all... Even if the results come as complete sets of 
10, we internally collect them asynchronously as they are processed. The 
number of buffers and processing stages is quite large, so the parallel 
processing nature of multi-tasking means that the results get interleaved. It 
is still possible that one set of results comes in so far in advance of 
everything else that it is completely processed before anything else arrives, 
then the display is all A, others.

However the major benefit is that the results from all the Sources are there at 
once, so even if the user uses the system to skip from Source to Source, it 
is quicker than running the search on all the Sources individually. And, yes, 
you can individually save a few here, one or two there to make your 
combined chosen few. 

But, first page only viewing does mean that the fastest Sources get the best 
spots. Is this an incentive to speed up the search systems? (Actually it has 
happened that a couple of the Sources who we showed comparative response time 
to, did use the figures to get funds for hardware replacement.)

Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Wednesday, May 19, 2010 12:45 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
 - OASIS SRU and CQL, access to most-current drafts
 
 Wait, but in the case you suspect is common, where you return results
 as
 soon as the first resource is returned, and subsequent results are
 added
 to the _end_ of the list
 
 I'm thinking that in most of these cases, the subsequent results will
 be
 several pages in, and the user will never even get there. And if the
 majority of users are only looking at results from one resource... why
 do a broadcast multi-server search in the first place?
 
 Peter Noerr wrote:
  However things are a bit different now...  At the risk of opening the
 debate once more and lots of lengthy discussion let me say that our
 experience (as one of the handful of commercial providers of multi-
 server search engines (MSSEs? - it'll never stick, but I like it)) is:
 
  1) Times are not slow for most installations as they are set by
 default to provide incremental results in the fashion Jakub suggests
 (First In, First Displayed). So users see results driven by the time
 of the fastest Source, not the slowest. contentious statementThis
 means that, on average, getting the results from a MSSE can be faster
 than doing the same search on all of the native sites (just talking
 response times here, not the fact it is one search versus N). Do the
 maths - it's quite fun. /contentious statement
 
  2) The average delay for just processing the results through modern
 MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
 network hops and the additional response time to first display is about
 3/4 of a second. This is a time shift all the way down the set of
 results - most of which the user isn't aware of as they are beyond the
 first 10 on screen, and the system allows interaction with those 10
 while the rest are getting their act together. So, under 1 second is
 added to response times which average about 5 seconds. Of course,
 waiting for all the results adds this time to the slowest results.
 
  3) Most users seem happy to get things back faster and not worry too
 much about relevance ranking. To combat the response time issue for
 users who require ranked results, the incremental return can be set to
 show interfiled results as the later records come in and rank within
 the ones displayed to the user. This can be disconcerting, but making
 sure the UI doesn't lose track of the user's focus is helpful. Another
 option is to show that new results are available, and let the user
 manually click to get them incorporated - less intrusive, but an extra
 click!
 
  General experience with the incremental displays shows that users are
 happiest with them when there is an obvious and clear reason for the
 new additions. The most accepted case is where the ranking criterion is
 price, and the user is always happy to see a cheaper item arrive. It
 really doesn't work well for titles sorted alphabetically - unless the
 user is looking for a specific title which should occur at the
 beginning of the list. And these examples illustrate the general point
 - that if the user is focused on specific items at the top of the list,
 then they are generally happy with an updating list, if they are more
 in browse mode, then the distraction of the updating list is just
 that - a distraction, if it is on screen.
 
  Overall our experience from our partner's users is that they would
 rather see things quickly than wait for relevance ranking. I suspect
 partly (can of worms coming) because the existing ranking schemes don't
 make a lot

Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Agreed it is a problem. What MSSEs do (when operating this way) is make this 
issue a response time dependent one. Users themselves make it a Source 
dependent one (they only look at results from the sites they decide to search). 
Ranking algorithms make it an algorithm dependent one (their algorithm will 
determine what is top of the list).

In all cases the results are vying for the few slots that the user will 
actually look at - above the fold, first 3, first page, etc. The problem 
is that all results cannot be first, and we do not have any way to insist the 
user look at all of them and make an informed selection. Anyway this can go all 
the way back to the collection policies of the library and the aggregators and 
even the cussedness of authors in not writing articles on exactly the right 
topic. (bad authors!) 

The MSEEs try to be even handed about it, but it doesn't always work. Possibly 
saving technologies here are text analysis and faceting. These can help take 
horizontal slices out of the vertically ordered list of results. That means 
the users can select another list which will be ordered a bit differently, and 
with text analysis and facets applied again, give them ways to slice and dice 
those results. But, in the end it requires enough interest from the user to do 
some refinement, and that battles with good enough.

Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Walker, David
 Sent: Wednesday, May 19, 2010 1:18 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
 - OASIS SRU and CQL, access to most-current drafts
 
  And if the majority of users are only looking at results
  from one resource... why do a broadcast multi-server
  search in the first place?
 
 More than just a theoretical concern.  Consider this from an article by
 Nina McHale:
 
 [R]eference and instruction staff at Auraria were asked to draw up a
 list of ten or so resources that would be included in a general-focus
 “Quick Search” . . . [h]owever, in practice, the result was
 disappointing. The results returned from the fastest resource were the
 results on top of the pile, and of the twelve resources chosen,
 PsycINFO routinely returned results first. Reference and instruction
 staff rightly felt that this skewed the results for a general query.
 [1]
 
 One library' perspective, and I'm pretty sure they were not using Muse.
 But conceptually the concern would be the same.
 
 --Dave
 
 [1] http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and-
 instruction.html
 
 ==
 David Walker
 Library Web Services Manager
 California State University
 http://xerxes.calstate.edu
 
 From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind [rochk...@jhu.edu]
 Sent: Wednesday, May 19, 2010 12:45 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
 - OASIS SRU and CQL, access to most-current drafts
 
 Wait, but in the case you suspect is common, where you return results
 as
 soon as the first resource is returned, and subsequent results are
 added
 to the _end_ of the list
 
 I'm thinking that in most of these cases, the subsequent results will
 be
 several pages in, and the user will never even get there. And if the
 majority of users are only looking at results from one resource... why
 do a broadcast multi-server search in the first place?
 
 Peter Noerr wrote:
  However things are a bit different now...  At the risk of opening the
 debate once more and lots of lengthy discussion let me say that our
 experience (as one of the handful of commercial providers of multi-
 server search engines (MSSEs? - it'll never stick, but I like it)) is:
 
  1) Times are not slow for most installations as they are set by
 default to provide incremental results in the fashion Jakub suggests
 (First In, First Displayed). So users see results driven by the time
 of the fastest Source, not the slowest. contentious statementThis
 means that, on average, getting the results from a MSSE can be faster
 than doing the same search on all of the native sites (just talking
 response times here, not the fact it is one search versus N). Do the
 maths - it's quite fun. /contentious statement
 
  2) The average delay for just processing the results through modern
 MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
 network hops and the additional response time to first display is about
 3/4 of a second. This is a time shift all the way down the set of
 results - most of which the user isn't aware of as they are beyond the
 first 10 on screen, and the system allows interaction with those 10
 while the rest are getting their act together. So, under 1 second is
 added to response times which average about 5 seconds. Of course,
 waiting for all the results adds this time to the slowest

Re: [CODE4LIB] What do you want out of a frbrized data web service?

2010-04-21 Thread Peter Noerr
For our fed search service we very much echo Jonathan's real-time 
requirements/use case (we don't build indexes, so bulk download is not of 
interest):

access - real-time query (purpose - to enhance data about items found by other 
means)
query - by standard IDs (generally this is known item augmentation, so 
discovery queries by keywords, etc are not so much required)
data format - almost anything standard  (we can translate it into the 
internal data model structure)
big value add - relationships, mainly the upward ones, towards work
data quantity - all details of directly related items, plus 2nd level links, 
possibly all details all the way up to (and including) the work (this is a 
trade-off of processing time on the service side to gather this information, 
and on our side to de-construct vs. the time to set up and manage multiple 
service calls to get the data about individual items in the link chain. In our 
experience it is almost always quicker to get it all-at-once than to send 
repeated messages, even if the total amount of data is less in the latter. But, 
mileage may vary here.) 


Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Wednesday, April 21, 2010 7:59 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] What do you want out of a frbrized data web
 service?
 
 So, okay, the value added stuff you have will indeed be relationships
 between entities, which is not too unexpected.
 
 So, yes, I would want a real-time query service (for enhancement of
 individual items on display in my system) _as well as_ a bulk download
 (for enhancing my data on indexing).
 
 For real time query, I'd have a specific entity at hand roughly
 corresponding to a 'manifestation'.  I'd want to look it up in your
 system by any identifiers I have (oclcnum, lccn, isbn, issn; any other
 music-related identifiers that are useful?) to find a match. Then I'd
 want to find out it's workset ID (or possibly expression ID?) in your
 system, and be able to find all the OTHER manifestations/expressions in
 those sets, from your system, with citation details about those items.
 (Author, title, publisher, year, etc; also oclcnum/lccn/isbn/issn/etc
 if
 available. Just giving me Marc with everything might be sufficient).
 If you have work identifiers from other systems that correspond to your
 workID (OCLC workID? etc), I'd want to know those.
 
 For bulk download, yeah, I'd just want everything you could give me,
 really.
 
 Some of the details can't really be spec'd in advance, it requires an
 interative process of people trying to use it and seeing what they need.
 I know this makes things hard from a grant-funded project management
 perspective.
 
 Jonathan
 
 Riley, Jenn wrote:
  On 4/20/10 7:18 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
 
  But first, to really answer the question, we need some more
 information
  from you. What data do you actually have of value? Just saying we
 have
  FRBRized data doesn't really tell me, FRBRized data can be almost
  anything, really.   Can you tell us more about what value you think
  you've added to your data as a result of your FRBRization?  What
 do
  you have that wasn't there before?  Better relationships between
  manifestations?  Something else?
 
 
  Heh, I was intentionally vague in an attempt to avoid skewing the
 discussion
  in certain directions, but I was obviously *too* vague - my apologies.
 Here
  are the sorts of things we'd imagined and are looking to prioritize:
 
  - Give me a list of all manifestations that match some arbitrary
 query terms
  - Given this manifestation identifier, show me all expressions on it
 and
  what works they realize
  - Give me a list of all works that match some arbitrary query terms
  - Given this work identifier, show all expressions and manifestations
 of it
  - Show me all of the people who match some arbitrary query terms
 (women
  composers in Vienna in the 1860s, for example)
  - Which works have expressions with this specific relationship to
 this
  particular known person?
 
  Basically we're exploring when we should support queries as words vs.
  previously-known identifiers, when a response will all be a set of
 records
  for the same entity vs. several different ones with the relationships
  between them recorded, to what degree answering a query will involve
  traversing lots of relationships - stuff like that. Having some real
 use
  cases will help us decide what kind of a service to offer and what
  technology we'll use to implement that service.
 
  We do hope to also be able to publish Linked Data in some form -
 that's
  probably going to come a little later, but it's definitely on the
 list.
 
  To answer one of your other questions, the V/FRBR project is focusing
 on
  musical materials (scores and recordings) in particular, but we hope
 to set
  up frameworks that would be useful for library bibliographic and
 

Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
For our purposes (federated search) it would be most useful to have as many of 
the available links (OL or other) as possible, and as much information about 
the link as possible. Obvious structural stuff like the type of identifier, 
but also the nature of the linked object (as you suggest full text, scan, 
etc.) This enables the links to be categorized in the user display so they 
can eliminate the ones not of interest, or focus on those that are.

Anything which differentiates the links from the perspective of the user is 
generally useful. In this regard some information about the editions at the 
ends of the links (even just a number and/or date) would be useful, and stop 
systems coming back to OL multiple times for all the linked records only to 
extract and display one or two bits of information. This has got to be the 
worst case for user response time, and almost certainly for load on the OL 
system. So if a certain amount of this information can be statically 
pre-coordinated with the links, or gathered by OL at request time, it has got 
to be more efficient.

For us the format of the records is of little importance as we convert them 
anyway.

Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Karen Coyle
 Sent: Tuesday, March 30, 2010 10:23
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Works API
 
 Open Library now has Works defined, and is looking to develop an API
 for their retrieval. It makes obvious sense that when a Work is
 retrieved via the API, that the data output would include links to the
 Editions that link to that Work. Here are a few possible options:
 
 1) Retrieve Work information (author, title, subjects, possibly
 reviews, descriptions, first lines) alone
 2) Retrieve Work information + OL identifiers for all related Editions
 3) Retrieve Work information + OL identifiers + any other identifiers
 related to the Edition (ISBN, OCLC#, LCCN)
 4) Retrieve Work information and links to Editions with full text / scans
 
 Well, you can see where I'm going with this. What would be useful?
 
 kc
 
 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet


Re: [CODE4LIB] Works API

2010-03-30 Thread Peter Noerr
I will just add (again) to the request for all links. As Jonathan says the 
client can then decide what to show, how to group them, and so on. 

I had rather sloppily elided things like format of full text into my 
structural information about the link. 

And second the request that some simple coding (controlled vocabulary anyone?) 
is used for these values so that we clients can determine what we are seeing.

Thanks  -  Peter


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 stuart yeates
 Sent: Tuesday, March 30, 2010 18:20
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Works API
 
 Jonathan Rochkind wrote:
  Karen Coyle wrote:
  The OL only has full text links, but the link goes to a page at the
  Internet Archive that lists all of the available formats. I would
  prefer that the link go directly to a display of the book, and offer
  other formats from there (having to click twice really turns people
  off, especially when they are browsing). So unfortunately, other than
  full text there won't be more to say.
 
  In an API, it would be _optimal_ if you'd reveal all these links, tagged
  with a controlled vocabulary of some kind letting us know what they are,
  so the client can decide for itself what to do with them (which may not
  even be immediately showing them to any user at all, but may be
  analyzing them for some other purpose).
 
 Even better, for those of us who have multiple formats of full text (TEI
 XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs
 to the full text, differentiated using the mime-type.
 
 cheers
 stuart
 --
 Stuart Yeates
 http://www.nzetc.org/   New Zealand Electronic Text Centre
 http://researcharchive.vuw.ac.nz/ Institutional Repository


Re: [CODE4LIB] code4lib / elag2010

2009-06-04 Thread Peter Noerr
As a long time founder member of ELAG I think this is an excellent idea. In its 
early days (for maybe the first 10 years) ELAG was a very technical meeting 
(yes, you *can* be geeky about mainframe software - all we had back then), but 
it has moved from that over time for all the best of reasons. I can't speak 
about the meeting in Bratislava, as I wasn't there, but it has over the years 
become more edu-torial and less cutting edge. I think it would benefit from the 
addition of a whole day of decidedly technical content in addition to the 
review and descriptive papers and its own very strong workshop format.

Peter Noerr

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Frumkin, Jeremy
 Sent: Thursday, June 04, 2009 08:51
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] code4lib / elag2010
 
 Hi Nicolas -
 
 I think what you are planning to do is great, and I personally can't see
 any issues with what you've described. There is a code4lib NW event today
 in Oregon, and there have been a number of regional code4lib events put
 together by members of the community.
 
 To me, holding events like this are exactly what code4lib is all about.
 
 
 -- jaf
 
 ==
 Jeremy Frumkin
 Assistant Dean / Chief Technology Strategist
 University of Arizona Libraries
 
 frumk...@u.library.arizona.edu
 +1 520.307.4548
 ==
 
 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Nicolas Morin
 Sent: Thursday, June 04, 2009 8:34 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] code4lib / elag2010
 
 Hello,
 
 A number of the code4lib community members are based in Europe : some
 of us where at the code4lib conference in the US in February; some of
 us where at the ELAG conference in Bratislava in Europe last April.
 There has been some talks about a code4lib conference in Europe in the
 past. At ELAG in April this idea came up again. But as Eric Lease
 Morgan noted in this forum a while ago on this subject, it would not
 be a good idea to duplicate efforts. ELAG itself is not as hands-on
 and geeky as code4lib : Ross Singer, who was at ELAG this year, said
 he found it more like Access. And while we don't want to duplicate
 efforts, we feel there's a place for a more technically oriented
 event.
 So, after some discussion with the people from ELAG, here's what we
 propose for 2010 : not a Code4lib Europe conference, but an addition
 to the existing ELAG conference : a one-day, pre-conference seminar
 for the code4lib community. Something hands-on, technical. Something
 about Lucene/solr in it's many incarnations was suggested, but we're
 very much open to other suggestions at this stage.
 
 I want to make sure that the code4lib community finds it appropriate
 that we use the code4lib space to setup something that's going to be
 tied to another conference (ELAG). I don't think there's an issue
 here, but I want to make sure no one feels the code4lib brand is
 being inappropriately used.
 
 I also want to make sure that those of you who are interested can
 participate in the setting up of this pre-conference : I opened a
 (more or less blank, at this stage) wiki page for this at
 http://wiki.code4lib.org/index.php/Code4lib/elag2010
 
 The code4lib community members who have expressed interest in setting
 this up so far are : Jakob Voss, Etienne Posthumus, Peter VanBoheemen,
 Till Kinstler and myself. If you're interested in this effort, feel
 free to go edit the wiki page.
 
 You can get more information about ELAG 2009 at
 http://indico.ulib.sk/elag2009
 ELAG 2010 will be hosted by the Finnish national library in Helsinki,
 in June 2010
 
 Cheers,
 Nicolas
 
 --
 Nicolas Morin
 Mobile: +33(0)633 19 11 36
 http://www.biblibre.com


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I am pleased to disagree to various levels of 'strongly (if we can agree on a 
definition for it :-).

Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he 
supplied

-snip
We could have something like:
http://purl.org/DataFormat/marcxml
  . skos:prefLabel MARC21 XML .
  . skos:notation info:srw/schema/1/marcxml-v1.1 .
  . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
  . skos:notation http://www.loc.gov/MARC21/slim; .
  . skos:broader http://purl.org/DataFormat/marc .
  . skos:description ... .

Or maybe those skos:notations should be owl:sameAs -- anyway, that's not really 
the point.  The point is that all of these various identifiers would be valid, 
but we'd have a real way of knowing what they actually mean.  Maybe this is 
what you mean by a crosswalk.
--end

Is exactly what I meant by a crosswalk. Basically a translating dictionary 
which allows any entity (system or person) to relate the various identifiers.

I would love to see a single unified set of identifiers, my life as a wrangled 
of record semantics would be s much easier. But I don't see it happening. 

That does not mean we should not try. Even a unification in our space (and if 
not in the library/information space, then where? as Mike said) reduces the 
larger problem. However I don't believe it is a scalable solution (which may 
not matter if all of a group of users agree, they why not leave them to it) as, 
at any time one group/organisation/person/system could introduce a new scheme, 
and a world view which relies on unified semantics would no longer be viable.

Which means until global unification on an object (better a (large) set of 
objects) is achieved it will be necessary to have the translating dictionary 
and systems which know how to use it. Unification reduces Ray's list of 15 
alternative uris to 14 or 13 or whatever. As long as that number is 1 
translation will be necessary. (I will leave aside discussions of massive 
record bloat, continual system re-writes, the politics of whose view prevails, 
the unhelpfulness of compromises for joint solutions, and so on.)

Peter

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Mike Taylor
 Sent: Friday, May 01, 2009 02:36
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All
 
 Jonathan Rochkind writes:
   Crosswalk is exactly the wrong answer for this. Two very small
   overlapping communities of most library developers can surely agree
   on using the same identifiers, and then we make things easier for
   US.  We don't need to solve the entire universe of problems. Solve
   the simple problem in front of you in the simplest way that could
   possibly work and still leave room for future expansion and
   improvement. From that, we learn how to solve the big problems,
   when we're ready. Overreach and try to solve the huge problem
   including every possible use case, many of which don't apply to you
   but SOMEDAY MIGHT... and you end up with the kind of
   over-abstracted over-engineered
   too-complicated-to-actually-catch-on solutions that... we in the
   library community normally end up with.
 
 I strongly, STRONGLY agree with this.  It's exactly what I was about
 to write myself, in response to Peter's message, until I saw that
 Jonathan had saved me the trouble :-)  Let's solve the problem that's
 in front of us right now: bring SRU into harmony with OpenURL in this
 respect, and the very act of doing so will lend extra legitimacy to
 the agreed-on identifiers, which will then be more strongly positioned
 as The Right Identifiers for other initiatives to use.
 
  _/|_  ___
 /o ) \/  Mike Taylorm...@indexdata.com
 http://www.miketaylor.org.uk
 )_v__/\  You cannot really appreciate Dilbert unless you've read it in
the original Klingon. -- Klingon Programming Mantra


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-05-01 Thread Peter Noerr
I agree with Ross wholeheartedly. Particularly in the use of an RDF based 
mechanism to describe, and then have systems act on, the semantics of these 
uniquely identified objects. Semantics (as in Web) has been exercising my 
thoughts recently and the problems we have here are writ large over all the SW 
people are trying to achieve. Perhaps we can help...

Peter 

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Friday, May 01, 2009 13:40
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule
 Them All
 
 Ideally, though, if we have some buy in and extend this outside our
 communities, future identifiers *should* have fewer variations, since
 people can find the appropriate URI for the format and use that.
 
 I readily admit that this is wishful thinking, but so be it.  I do
 think that modeling it as SKOS/RDF at least would make it attractive
 to the Linked Data/Semweb crowd who are likely the sorts of people
 that would be interested in seeing URIs, anyway.
 
 I mean, the worst that can happen is that nobody cares, right?
 
 -Ross.
 
 On Fri, May 1, 2009 at 3:41 PM, Peter Noerr pno...@museglobal.com wrote:
  I am pleased to disagree to various levels of 'strongly (if we can agree
 on a definition for it :-).
 
  Ross earlier gave a sample of a crossw3alk' for my MARC problem. What he
 supplied
 
  -snip
  We could have something like:
  http://purl.org/DataFormat/marcxml
   . skos:prefLabel MARC21 XML .
   . skos:notation info:srw/schema/1/marcxml-v1.1 .
   . skos:notation info:ofi/fmt:xml:xsd:MARC21 .
   . skos:notation http://www.loc.gov/MARC21/slim; .
   . skos:broader http://purl.org/DataFormat/marc .
   . skos:description ... .
 
  Or maybe those skos:notations should be owl:sameAs -- anyway, that's not
 really the point.  The point is that all of these various identifiers would
 be valid, but we'd have a real way of knowing what they actually mean.
  Maybe this is what you mean by a crosswalk.
  --end
 
  Is exactly what I meant by a crosswalk. Basically a translating
 dictionary which allows any entity (system or person) to relate the various
 identifiers.
 
  I would love to see a single unified set of identifiers, my life as a
 wrangled of record semantics would be s much easier. But I don't see it
 happening.
 
  That does not mean we should not try. Even a unification in our space
 (and if not in the library/information space, then where? as Mike said)
 reduces the larger problem. However I don't believe it is a scalable
 solution (which may not matter if all of a group of users agree, they why
 not leave them to it) as, at any time one group/organisation/person/system
 could introduce a new scheme, and a world view which relies on unified
 semantics would no longer be viable.
 
  Which means until global unification on an object (better a (large) set
 of objects) is achieved it will be necessary to have the translating
 dictionary and systems which know how to use it. Unification reduces Ray's
 list of 15 alternative uris to 14 or 13 or whatever. As long as that number
 is 1 translation will be necessary. (I will leave aside discussions of
 massive record bloat, continual system re-writes, the politics of whose
 view prevails, the unhelpfulness of compromises for joint solutions, and so
 on.)
 
  Peter
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
  Mike Taylor
  Sent: Friday, May 01, 2009 02:36
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to
 Rule
  Them All
 
  Jonathan Rochkind writes:
    Crosswalk is exactly the wrong answer for this. Two very small
    overlapping communities of most library developers can surely agree
    on using the same identifiers, and then we make things easier for
    US.  We don't need to solve the entire universe of problems. Solve
    the simple problem in front of you in the simplest way that could
    possibly work and still leave room for future expansion and
    improvement. From that, we learn how to solve the big problems,
    when we're ready. Overreach and try to solve the huge problem
    including every possible use case, many of which don't apply to you
    but SOMEDAY MIGHT... and you end up with the kind of
    over-abstracted over-engineered
    too-complicated-to-actually-catch-on solutions that... we in the
    library community normally end up with.
 
  I strongly, STRONGLY agree with this.  It's exactly what I was about
  to write myself, in response to Peter's message, until I saw that
  Jonathan had saved me the trouble :-)  Let's solve the problem that's
  in front of us right now: bring SRU into harmony with OpenURL in this
  respect, and the very act of doing so will lend extra legitimacy to
  the agreed-on identifiers, which will then be more strongly positioned
  as The Right Identifiers

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about)
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88 sigh - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are. 

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Thursday, April 30, 2009 12:00
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
 All
 
 Hello everybody.  I apologize for the crossposting, but this is an
 area that could (potentially) affect every one of these groups.  I
 realize that not everybody will be able to respond to all lists,
 but...
 
 First of all, some back story (Code4Lib subscribers can probably skip
 ahead):
 
 Jangle [1] requires URIs to explicitly declare the format of the data
 it is transporting (binary marc, marcxml, vcard, DLF
 simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
 own URI structure for this (http://jangle.org/vocab/formats#...) but
 this was always been with the intention of moving out of the
 jangle.org into a more generic space so it could be used by other
 initiatives.
 
 This same concept came up in UnAPI [2] (I think this thread:
 http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
 March/thread.html#682
 discusses it a bit - there is a reference there that it maybe had come
 up before) although was rejected ultimately in favor of an (optional)
 approach more in line with how OAI-PMH disambiguates metadata formats.
  That being said, this page used to try to set sort of convention
 around the UnAPI formats:
 http://unapi.stikipad.com/unapi/show/existing+formats
 But it's now just a squatter page.
 
 Jakob Voss pointed out that SRU has a schema registry and that it
 would make sense to coordinate with this rather than mint new URIs for
 things that have already been defined there:
 http://www.loc.gov/standards/sru/resources/schemas.html
 
 This, of course, made a lot of sense.  It also made me realize that
 OpenURL *also* has a registry of metadata formats:
 http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataP
 refix=oai_dcset=Core:Metadata+Formats
 
 The problem here is that OpenURL and SRW are using different info URIs
 to describe the same things:
 
 info:srw/schema/1/marcxml-v1.1
 
 info:ofi/fmt:xml:xsd:MARC21
 
 or
 
 info:srw/schema/1/onix-v2.0
 
 info:ofi/fmt:xml:xsd:onix
 
 The latter technically isn't the same thing since the OpenURL one
 claims it's an identifier for ONIX 2.1, but if I wasn't sending this
 email now, eventually SRU would have registered
 info:srw/schema/1/onix-v2.1
 
 There are several other examples, as well (MODS, ISO20775, etc.) and
 it's not a stretch to envision more in the future.
 
 So there are a couple of questions here.
 
 First, and most importantly, how do we reconcile these different
 identifiers for the same thing?  Can we come up with some agreement on
 which ones we should really use?
 
 Secondly, and this gets to the reason why any of this was brought up

Re: [CODE4LIB] exact title searches with z39.50

2009-04-29 Thread Peter Noerr
To sidestep the issue of strict/relaxed and face the real world of spotty 
implementation of standards (and it seems to apply however non/arcane they are) 
we provide a configurable strictness flag and the ability to have 
non-supported indexes and some functions mapped to supported ones on a Source 
by Source basis. Admins can allow users to have this strict/relaxed switch or 
not. And users can apply it or not. For both the majority case is not (i.e. 
relaxed is used).

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 28, 2009 08:43
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] exact title searches with z39.50
 
 It can be a chicken-egg thing too.  Maybe more users would be doing more
 sophisticated searches if they actually _worked_.
 
 Plus I know that I could write systems to use federated search to embed
 certain functionality in certain places, if more sophisticated searches
 worked more reliably.
 
 Walker, David wrote:
  I'm not sure it's a _big_ mess, though, at least for metasearching.
 
  I was just looking at our metasearch logs this morning, so did a quick
 count: 93% of the searches were keyword searches.  Not a lot of exactness
 required there.  It's mostly in the 7% who are doing more specific searches
 (author, title, subject) where the bulk if the problems lie, I suspect.
 
  --Dave
 
  ==
  David Walker
  Library Web Services Manager
  California State University
  http://xerxes.calstate.edu
  
  From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Ray
 Denenberg, Library of Congress [r...@loc.gov]
  Sent: Tuesday, April 28, 2009 8:32 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] exact title searches with z39.50
 
  Right, Mike. There is a long and rich history of the debate between loose
  and strict interpretation, in the world at large, and in particular,
 within
  Z39.50, this debate raged from the late 1980s throughout the 90s.  The
  faction that said If you can't give the client what is asks for, at
 least
  give them something; make them happy was almost religious in its zeal.
  Those who said If you can't give the client what it asks for, be honest
  about it; give them good diagnostic information, tell them a better way
 to
  formulate the request, etc. But don't pretend the transaction was a
 success
  if it wasn't was shouted down most every time.   I can't predict, but
 I'm
  just hoping that lessons have been learned from the mess that that
 mentality
  got us into.
 
  --Ray
 
  - Original Message -
  From: Mike Taylor m...@indexdata.com
  To: CODE4LIB@LISTSERV.ND.EDU
  Sent: Tuesday, April 28, 2009 10:43 AM
  Subject: Re: [CODE4LIB] exact title searches with z39.50
 
 
 
  Ray Denenberg, Library of Congress writes:
 
  The irony is that Z39.50 actually make _much_ more effort to
  specify semantics than most other standards -- and yet still
  finds itself in the situation where many implementations do not
  respond correctly to the BIB-1 attribute 6=3
  (completeness=complete field) which is how Eric should be able to
  do what he wants here.
 
  Not that I have any good answers to this problem ... but I DO
  know that inventing more and more replacement standards it NOT
  the answer.  Everything that's come along since Z39.50 has
  suffered from exactly the same problem but more so.
 
  I think this remains to be seen for SRU/CQL, in particular for the
  example at hand, how to search for exact title.  There are two
  related issues: one, how arcane the standard is, and two, how
  closely implementations conform to the intended semantics. And
  clearly the first has a bearing on the second.
 
  And even I would say that Z39.50 is a bit on the arcance side when
  it comes to formulating a query for exact title. With SRU/CQL there
  is an exact relation ('exact' in 1.1, '==' in 1.2).  So I would
  think there is less excuse for a server to apply a creative
  interpretation. If it cannot support exact title it should fail
  the search.
 
  IMHO, this is where it breaks down 90% of the time.  Servers that
  can't do what they're asked should say I can't do that, but -- for
  reasons that seem good at the time -- nearly no server fails requests
  that it can sort of fulfil.  Nine out of ten Z39.50 servers asked to
  do a whole-field search and which can't do it will instead do a word
  search, because it's better to give the user SOMETHING.  I bet the
  same is true of SRU servers.  (I am as guilty as anyone else, I've
  written servers like that.)
 
  The idea that it's better to give the user SOMETHING might -- might
  -- have been true when we mostly used Z39.50 servers for interactive
  sessions.  Now that they are mostly used as targets in metasearching

Re: [CODE4LIB] Serials Solutions Summon

2009-04-21 Thread Peter Noerr
From one of the Federated Search vendor's perspective... 

It seems in the broader web world we in the library world have lost 
metasearch. That has become the province of those systems (mamma, dogpile, 
etc.) which search the big web search engines (G,Y,M, etc.) primarily for 
shoppers and travelers (kayak, mobissimo, etc.) and so on. One of the original 
differences between these engines and the library/information world ones was 
that they presented results by Source - not combined. This is still evident in 
a fashion in the travel sites where you can start multiple search sessions on 
the individual sites.

We use Federated Search for what we do in the library/information space. It 
equates directly to Jonathan's Broadcast Search which was the original term I 
used when talking about it about 10 years ago. Broadcast is more descriptive, 
and I prefer it, but it seems an uphill struggle to get it accepted.

Fed Search has the problem of Ray's definition of Federated, to mean a bunch 
of things brought together. It can be broadcast search (real time searching of 
remote Sources and aggregation of a virtual result set), or searching of a 
local (to the searcher) index which is composed of material federated from 
multiple Sources at some previous time. We tend to use the term Aggregate 
Index for this (and for the Summon-type index) Mixed content is almost a 
given, so that is not an issue. And Federated Search systems have to undertake 
in real time the normalization and other tasks that Summon will be (presumably) 
putting into its aggregate index.

A problem in terminology we come across is the use of local (notice my 
careful caveat in its use above). It is used to mean local to the searcher (as 
in the aggregate/meta index above), or it is used to mean local to the original 
documents (i.e. at the native Source).

I can't imagine this has done more than confirm that there is no agreed 
terminology - which we sort of all knew. So we just do a lot of explaining - 
with pictures - to people.

Peter Noerr


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com




 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 21, 2009 08:59
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Serials Solutions Summon
 
 Ray Denenberg, Library of Congress wrote:
 
  Leaving aside metasearch and broadcast search (terms invented more
 recently)
  it  is a shame if federated has really lost its distinction
  fromdistributed.  Historically, a federated database is one that
  integrates multiple (autonomous) databases so it is in effect a
 virtual
  distributed database, though a single database.I don't think
 that's a
  hard concept and I don't think it is a trivial distinction.
 
 
 For at least 10 years vendors in the library market have been selling
 us
 products called federated search which are in fact
 distributed/broadcast search products.
 
 If you want to reclaim the term federated to mean a local index, I
 think you have a losing battle in front of you.
 
 So I'm sticking with broadcast search and local index.  Sometimes
 you need to use terms invented more recently when the older terms have
 been used ambiguously or contradictorily.  To me, understanding the two
 different techniques and their differences is more important than the
 terminology -- it's just important that the terminology be understood.


Re: [CODE4LIB] Dutch Code4Lib

2009-01-26 Thread Peter Noerr
You can find out about the current (2009) meeting here 
(http://library.wur.nl/elag2008/elag2009.html). The program is set, but ELAG is 
built round workshops and it is probably possible to add a new one, even at 
this late date. Contact the program committee.
 
ELAG was formed about 25 years ago by people working for the national libraries 
of Europe and the larger universities who were all struggling trying to build 
library automation systems and facing a number of real technical problems. Its 
original aim was, and is, to allow exchange of techniques and experiences. Over 
the years it has changed as technology was developed, often by the E:AG member 
organisations, and the world changed. It has a meeting every year in a 
different location within Europe to encourage a wider audience and enable a 
secondary aim of education in each location for people who were not able to 
travel.
 
Peter Noerr
MuseGlobal
(ex British Library - founder member of ELAG)



From: Code for Libraries on behalf of Edward M. Corrado
Sent: Sat 2009-01-24 17:46
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Dutch Code4Lib



I really don't know much about ELAG besides I'm going there. I think
they have the program set, or just about set, for this year so I don't
know if anything could me done formally in 2009 but maybe an informal
thing could be arranged the day before or the day after? At the least
maybe contacts can be made for 2010?

Does anyone know more about ELAG?

Edward


On 1/23/09, Hamparian,Don hampa...@oclc.org wrote:
 Do you see any opportunities to partner with them for an European meeting?
 Or is that more trouble then its worth?


 -Original Message-
 From: Code for Libraries on behalf of Ross Singer
 Sent: Thu 1/22/2009 4:06 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Dutch Code4Lib

 Eric, this is a good point.  I will be at ELAG this year, and I think
 Ed Corrado will, too.

 Past presentations look to be very in line with Code4lib and, in fact,
 it was billed to me as If you think of Access or Code4Lib but in a
 scenic European setting with great beer then you'll have a good idea
 of what we are planning by Ron Davies, one of the coordinators.

 -Ross.

 On Thu, Jan 22, 2009 at 2:33 PM, Eric Lease Morgan emor...@nd.edu wrote:
 On 1/22/09 1:02 PM, Ed Summers e...@pobox.com wrote:

 Wow, this sounds too good to be true. Perhaps this is premature, but
 do you think there might be interest in hosting a code4lib2010 in the
 Netherlands? (he asks selfishly).

 On another note, there is already a library conference that is apparently
 very similar to the Access tradition and Code4Lib that takes place in
 Europe, and I think it is called European Library Automation Group (ELAG).
 See:

  http://indico.ulib.sk/MaKaC/conferenceDisplay.py?confId=5

 While I would love to have a Code4Lib thang in Europe, maybe there is
 something already in place. This year it is in Bratislava (Slovakia). Next
 year I believe it takes place somewhere in Norway.

 --
 Eric Morgan




Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-11 Thread Peter Noerr
Hi Steve,

Thanks for a full reply.

We actually do combine date within enterprises, including from their ILS
and subscription Sources (article databases), and internal repositories.
Of course we claim we do it well - and I think we do. A library
background will enable you to face almost any shape of data with aplomb,
if not equanimity.

Data from varied sources is varied in structure, type of content and
level of detail, as you say. It *is* possible to combine it, but it
works best when there is some sort of commonality across the sources.
Fortunately most people when searching provide that focus, so the
theoretical problem is very rarely a practical one - and this business
is all about practical solutions. We do actually have a fair number of
the enterprise search engine vendors as partners where we act as a
selective harvesting capability for them and convert the syntax and
semantics of the harvested records into a uniformity they can easily
ingest and work their indexing magic on.

Fence sitting has a long and honourable tradition (both in the UK and
the US), and we 'back both horses' ourselves by being in both the
federated search and content integration space. Thus involved in both
the just-in-case harvesting, and the just-in-time fed searching.

Final thought is that almost everybody we have dealt with is a special
case - most of them in the nicest possibly way - so, even for systems
like ours, customization is the order of the day. But that's what
computers allow us to do - adapt to users.

Peter  

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
 Steve Oberg
 Sent: Friday, July 11, 2008 12:15 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Enterprise Search and library collection
 [SEC=UNCLASSIFIED]
 
 Peter,
 
 Use an search engine and create an aggregated database/index of all
the
  material from the organization, or use a federated search system to
  search the repositories/catalogs/databases/etc. in real time? Did
you
  consider both? And why the choice you made?
 
 
 I was not involved in the initial planning. I came in sort of halfway
 through and had to make a lot of the initial planning decisions work.
 (Even
 while I disagreed with some of those decisions.)  Again, my
perspective
 relates mostly to use of catalog data.   However, I would add that we
 did in
 fact have a federated search tool when I came but quickly discarded it
 because it couldn't do the more limited functionality we were hoping
 for it
 to accomplish (present best options to users for where to search among
 our
 databases and collections according to subject), let alone aggregate
or
 search across disparate data repositories.
 
 Personally I find it very difficult to believe that a federated search
 such
 as what you provide at MuseGlobal can do this sort of enterprise
 combination
 of data well.  The data is not well structured (except for catalog
 data) and
 includes an extreme range of completeness and little commonality.
What
 is
 interesting to note, however, is that on the one hand, a vendor such
as
 yourself may claim that you can do this sort of stuff well (I'm not
 saying
 you said that, just that you might say that). On the other hand I find
 it
 interesting to note that the enterprise search tool vendor we have,
 coming
 from a completely different market and perspective, would readily
claim
 they
 can do all that library stuff -- that they do in fact offer true
 federated
 search. Which in my personal opinion isn't true at all.
 
 But ideally I would answer your question in this way. I think there
 should
 be a combination of the two approaches, that this would be more
 practical
 and workable than just one or the other.  How's that for sitting on
the
 fence :-)
 
 
  Build vs. Buy? It obviously has taken Steve and his colleagues a lot
 of
  hard work to produce a nice looking system (except for all those big
  black bits on the screen!) and it obviously takes maintenance (it is
  'fragile') Do you think it was/is worth it and if so why?
 
 
 My answer is, it is too soon to tell.  There are many reasons why our
 implementation is probably unique (and I don't mean to imply that it
is
 better than someone else's, just that I doubt it could readily be
 replicated
 elsewhere).  We have a number of very different requirements and use
 cases
 than what some other library settings might have.  We have a large
 number of
 constraints on the IT side.  We have had to do a lot of custom stuff
as
 a
 result. This is probably why it is fragile, more than because of
 deficiencies in any one piece such as the search tool itself.
 
 But we are still, in my view, only at the very early stages of
 assessing the
 whole package's value for our users.  And we have very particular,
 demanding
 users.
 
 In sum, we have had to buy AND build and so it isn't, again, a
question
 of
 one versus the other.
 
 Steve


Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-10 Thread Peter Noerr
Hi Steve and Renata,

First the declaration of interest: I am the CTO of a federated search
system company. However I am not trying to suggest you should use our
(or any) federated search system (so I will, coyly, not attach a
signature to this email).

I am interested in your comments on either or both of two questions:

Use an search engine and create an aggregated database/index of all the
material from the organization, or use a federated search system to
search the repositories/catalogs/databases/etc. in real time? Did you
consider both? And why the choice you made?

Build vs. Buy? It obviously has taken Steve and his colleagues a lot of
hard work to produce a nice looking system (except for all those big
black bits on the screen!) and it obviously takes maintenance (it is
'fragile') Do you think it was/is worth it and if so why?

Peter Noerr

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
 Steve Oberg
 Sent: Thursday, July 10, 2008 8:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Enterprise Search and library collection
 [SEC=UNCLASSIFIED]
 
 Renata and others,
 
 After posting my original reply I realized how dumb it was to respond
 but
 say, sorry, can't tell you more.  As an aside, this is one of the
 things
 that irritates me the most about working in a for profit environment:
 the
 control exerted by MPOW over just about anything. But hey, this is the
 job
 situation I've consciously chosen so, I guess I shouldn't complain.
 
 Although I can't name names and go into detail about our
 implementation, I
 have anonymized screenshots of various aspects of it and posted
 details
 about it at
 http://familymanlibrarian.com/2007/01/21/more-on-turning-the-catalog-
 inside-out/
 Keep in mind that my involvement has been focused on the catalog side.
 A
 lot of the behind-the-scenes work also dealt with matching subject
 terms in
 catalog records to the much simpler taxonomy chosen for our website.
 You
 can imagine that it can be quite complicated to set up a good rule set
 for
 matching LCSH or MeSH terms effectively to a more generic set of
 taxonomy
 terms and have those be meaningful to end users. We are continually
 evaluating and tweaking this setup.
 
 As far as other general details, this implementation involved a lot of
 people, in fact a team of about 15, some more directly and exclusively
 and
 others peripherally.  In terms of maintenance, day to day maintenance
 is
 handled by about three FTE.  Our library catalog data is refreshed
once
 a
 day, as is the citation database to which I referred in the previous
 email,
 and content from our web content management environment.  A few other
 repositories are updated weekly because their content isn't as
 volatile.
 The whole planning and implementation process took a year and is still
 really working through implementation issues. For example we recently
 upgraded the version of our enterprise search tool to a newer version
 and
 this was a major change requiring a lot of resources and it took a lot
 more
 time to do than expected.
 
 I hope this additional information is helpful.
 
 Steve
 
 On Tue, Jul 8, 2008 at 1:11 AM, Dyer, Renata
 [EMAIL PROTECTED]
 wrote:
 
  Our organisation is looking into getting an enterprise search and I
 was
  wondering how many libraries out there have incorporated library
  collection into a 'federated' search that would retrieve a whole
lot:
  a library collection items, external sources (websites, databases),
  internal documents (available on share drives and/or records
 systems),
  maybe even records from other internal applications, etc.?
 
 
  I would like to hear about your experience and what is good or bad
 about
  it.
 
  Please reply on or offline whichever more convenient.
 
  I'll collate answers.
 
  Thanks,
 
  Renata Dyer
  Systems Librarian
  Information Services
  The Treasury
  Langton Crescent, Parkes ACT 2600 Australia
  (p) 02 6263 2736
  (f) 02 6263 2738
  (e) [EMAIL PROTECTED]
 
  https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49
 
 
 
 **
  Please Note: The information contained in this e-mail message
  and any attached files may be confidential information and
  may also be the subject of legal professional privilege.  If you are
  not the intended recipient, any use, disclosure or copying of this
  e-mail is unauthorised.  If you have received this e-mail by error
  please notify the sender immediately by reply e-mail and delete all
  copies of this transmission together with any attachments.
 
 **