Re: [CODE4LIB] de-dupping (was: marc4j 2.4 released)

2008-11-04 Thread Min-Yen Kan
Hi Michael:

Thanks for your email.  No we haven't implemented any merging system.
Our software currently just tries to do clustering of
similar/identical records.  We may consider creating a merge algorithm
that is generic, which might then be customized to make some of your
pointed canonicalizations eas(ier) to do.  As for integrating it with
marc4j, currently we don't have specific plans for this (although we'd
appreciate any interested folks who'd like to help).

> So back to the de-dup thing (things got busy here). Has anyone
> implemented a merging algorithm like this one:
> http://www.kcoyle.net/temp/merge.html
>
> It's the referred to via openlibrary here:
> http://openlibrary.org/about/lib
>
> Putting something like this in marc4j would be sweet.
> Mike Beccaria
> Systems Librarian
> Head of Digital Initiatives
> Paul Smith's College
> 518.327.6376
> [EMAIL PROTECTED]

Cheers,

Min

--
Min-Yen KAN (Dr) :: Assistant Professor :: National University of
Singapore :: School of Computing, AS6 05-12, Law Link, Singapore
117590 :: 65-6516 1885(DID) :: 65-6779 4580 (Fax) ::
[EMAIL PROTECTED] (E) :: www.comp.nus.edu.sg/~kanmy (W)

Important: This email is confidential and may be privileged. If you
are not the intended recipient, please delete it and notify us
immediately; you should not copy or use it for any purpose, nor
disclose its contents to any other person. Thank you.


Re: [CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]

2008-11-04 Thread Lovins, Daniel
Karen,

I don't have anything useful to add, but just wanted to express my gratitude 
and second Owen's comment that this document is very nicely done.

The breakdown of key components (e.g., functional requirements vs. domain model 
vs. usage guidelines, etc.)  is quite helpful, as is the diagram of the 
Singapore Framework.

I also appreciated the concrete example of the "Bookshelf DCAP", and the 
demonstration of RDF triples in the context of a domain model (i.e., "book" and 
"author" as entities; "title" and "name" as properties"; "is authored by" as a 
relationship).

I'm intrigued by the possibility of integrating more dynamic, visually 
interesting applications of LCSH (http://lcsh.info/) and other vocabularies 
into our catalogs, and this document helps me better understand the 
prerequisites and opportunities to keep in mind.

/ Daniel

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Karen Coyle
Sent: Tuesday, November 04, 2008 1:07 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]

Thank you, Owen! A few comments interspersed...

Stephens, Owen wrote:
> Hi Karen,
>
> Yes - the document on DCAP makes sense (this maybe the first time I've
> ever uttered these words on a first reading of DCMI documentation - so
> well done!)
>
wow


Re: [CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]

2008-11-04 Thread Karen Coyle

Thank you, Owen! A few comments interspersed...

Stephens, Owen wrote:

Hi Karen,

Yes - the document on DCAP makes sense (this maybe the first time I've
ever uttered these words on a first reading of DCMI documentation - so
well done!)
  

wow


I would question what the benefit of doing a full DCAP is as opposed to
doing the bits that are clearly of practical value. Although I buy the
argument that it they promote sharing/linking of data in theory, I
haven't seen any real-world examples of this - has SWAP had more of an
impact because there is a DCAP for it? 
Not that I'm aware of. DCAP (as well as DCAM) are pretty much in their 
embryonic stages and haven't had real world proof yet. There are APs 
that use some of the DCAP concepts but not all, and in fact it would be 
very difficult at this point to create a full AP for libraries since we 
don't have our vocabularies all defined in RDF. So I agree that an 
intermediate approach makes sense at this moment in time.

If we were starting from scratch
a DCAP would be (at least) as good a way as any other of capturing stuff
like functional requirements and Usage Guidelines - but since these
don't actually add to the functionality of the metadata scheme you end
up with as far as I can see, where we already have this stuff in other
forms (as suggested in the Usage Guidelines) then what would be the
tangible benefits of restating for a DCAP? (I suppose the flip side of
this is - would it be much work to do so?)
  
I think this is an excellent question, and one that needs to be 
addressed by the DC community. It is incumbent on them to make the case 
for their standards in a way that translates to a real motivation for 
metadata developers. The DCAP document goes further in this direction 
than other documents, but the benefits of DCAM are less clearly expressed.

Touching on the Usage Guidelines - I'd question whether the example
given of AACR2 as an existing set of usage guidelines which you could
refer to in the DCAP is completely accurate? Doesn't AACR2 hold a
mixture of things that are usage guidelines, and things that would live
in the DSP? If this is so, it may be worth being explicit about this to
avoid misunderstandings.
  
I'm not sure that AACR2 (or RDA) go much beyond usage guidelines. They 
don't define data elements as such, and they don't provide a record 
format. They are about making decisions about the description of 
something. But I think I know what you mean, because we don't have 
anything BUT the cataloging rules to go on so they seem to embody our 
data definitions as well. But not the formal data definitions, which 
then gets done after the fact in MARC. It's not a good approach to 
define and manage these two standards separately.

Further on the Usage Guidelines, one of the examples of a possible
guideline is " For works of multiple authorship, the order of authors
and how many to include (e.g. first 3, or no more than 20)". I'm not
clear why you would express 'no more than 20' here, rather than as part
of the relevant Description Template in the DSP?
  
It's just an example, but I see that it's confusing. In fact, you could 
have those kinds of instructions either in the DSP or the usage 
guidelines, or in both. For example, you can use Dublin Core fields, 
which have no limitations on repeatability or mandatoriness, but can 
include rules in the usage guidelines that aren't enforced in the DSP. 
However, I'll change this example to be about the ORDER of authors, 
which makes more sense in guidelines. Does that sound better?

In terms of the library world, a question that occurs is that if we went
down this route, would we find that we ended up with a single DCAP for
libraries? As I think about it I wonder if we would find multiple DCAPs
were required - perhaps Public Libraries would have a different DCAP to
Research Libraries. Possibly more likely different types of collections
would require different DCAPs. For example, it seems likely to me that
the Functional Requirements for a rare books collection is different to
that of the DVD collection. Further, it seems likely to me that the
requirements for the DVD collection in my local public library is
different to that of the DVD collection at my local media-arts college.
  
Personally, I am totally for multiple APs for the library world. One of 
the things that makes the cataloging rules so complex, and our records 
so complex, is that they try to cover every possible type of resource 
for every possible type of library. And therefore they fail for some 
percentage of the cases. Your examples here make perfect sense to me.

If this is the case, what are the implications of mixing DCAPs within or
across libraries? How would different DCAPs work together? What would be
the implications for sharing records? Am I looking for problems here, or
anticipating real issues? (I did read the document on Interoperability,
but not sure I understand what it is getting at yet - however, I'm not
sure

Re: [CODE4LIB] "release management"

2008-11-04 Thread Andrew Nagy
I second the notion for Fogel's book.

From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Randy Metcalfe [EMAIL 
PROTECTED]
Sent: Wednesday, October 29, 2008 10:42 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] "release management"

2008/10/29 Jonathan Rochkind <[EMAIL PROTECTED]>:
> Can anyone reccommend any good sources on how to do 'release management' in
> a small distributed open source project. Or in a small in-house not open
> source project, for that matter. The key thing is not something assuming
> you're in a giant company with a QA team, but instead a small project with a
> a few (to dozens) of developers, no dedicated QA team, etc.
>
> Anyone have any good books to reccommend on this?

Karl Fogel's book Producing Open Source Software is an excellent
choice, though it is not solely focused on release management.

http://producingoss.com/

Cheers,

Randy

--
Randy Metcalfe


Re: [CODE4LIB] Code4lib mugs?

2008-11-04 Thread Richard Wallis

And there was me thinking I had the monopoly on great ideas!

Whilst still confessing a yearning for a code4lib emblazoned hot  
beverage container, I do concede that the audio/video support idea  
does have merits.


Would someone like to contact me directly to explore this option  
further.


~Richard

On 4 Nov 2008, at 15:57, Doran, Michael D wrote:


John Fereira wrote:
A Talis sponsorship of audio/video support:  Not only benefits
attendees but benefits those that can't attend the conference
and can watch the audio/video captures after the conference.

Seems to me that #3 is a clear winner.


That does seem like a win-win option.  Especially given Kevin  
Clarke's suggestion that a Talis acknowledgement could be included  
in the videos.


-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/



-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On
Behalf Of John Fereira
Sent: Tuesday, November 04, 2008 12:28 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4lib mugs?

Jonathan Rochkind wrote:

Aha, funding the audio and video is a great idea. Meets Code4Lib
needs, and also meets sponsor advertising needs, because all the
videos and audio could go up with a "capture of this content was
sponsored by Insert Vendor Here" link. I think Bill's idea

is great.

Someone would still need to be found to volunteer to recruit and
supervise this hypothetical student.

A Talis sponsored mug:  Benefits everyone that attends the
conference a
little

A Talis sponsored scholarship:  Benefits  only one person and if it's
like some of the previous scholarship excludes some from
being eligible
to receive it.

A Talis sponsorship of audio/video support:  Not only
benefits attendees
but benefits those that can't attend the conference and can watch the
audio/video captures after the conference.

Seems to me that #3 is a clear winner.



Richard Wallis
Technology Evangelist, Talis
Tel: +44 (0)870 400 5422 (Direct)
Tel: +44 (0)870 400 5000 (Switchboard)
Tel: +44 (0)7767 886 005 (Mobile)
Fax: +44 (0)870 400 5001

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
IM: [EMAIL PROTECTED]
I-Name: =Richard.Wallis


Re: [CODE4LIB] Code4lib mugs?

2008-11-04 Thread Doran, Michael D
> John Fereira wrote:
> A Talis sponsorship of audio/video support:  Not only benefits
> attendees but benefits those that can't attend the conference
> and can watch the audio/video captures after the conference.
> 
> Seems to me that #3 is a clear winner.

That does seem like a win-win option.  Especially given Kevin Clarke's 
suggestion that a Talis acknowledgement could be included in the videos. 

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
  

> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On 
> Behalf Of John Fereira
> Sent: Tuesday, November 04, 2008 12:28 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Code4lib mugs?
> 
> Jonathan Rochkind wrote:
> > Aha, funding the audio and video is a great idea. Meets Code4Lib 
> > needs, and also meets sponsor advertising needs, because all the 
> > videos and audio could go up with a "capture of this content was 
> > sponsored by Insert Vendor Here" link. I think Bill's idea 
> is great.  
> > Someone would still need to be found to volunteer to recruit and 
> > supervise this hypothetical student. 
> A Talis sponsored mug:  Benefits everyone that attends the 
> conference a 
> little
> 
> A Talis sponsored scholarship:  Benefits  only one person and if it's 
> like some of the previous scholarship excludes some from 
> being eligible 
> to receive it.
> 
> A Talis sponsorship of audio/video support:  Not only 
> benefits attendees 
> but benefits those that can't attend the conference and can watch the 
> audio/video captures after the conference.
> 
> Seems to me that #3 is a clear winner.
> 


Re: [CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]

2008-11-04 Thread Stephens, Owen
Hi Karen,

Yes - the document on DCAP makes sense (this maybe the first time I've
ever uttered these words on a first reading of DCMI documentation - so
well done!)

There are parts of this that I definitely think would be beneficial for
library data. Specifically the analysis of metadata against schemes that
already exist, and the development of a Description Set Profile (in a
machine readable format).

I would question what the benefit of doing a full DCAP is as opposed to
doing the bits that are clearly of practical value. Although I buy the
argument that it they promote sharing/linking of data in theory, I
haven't seen any real-world examples of this - has SWAP had more of an
impact because there is a DCAP for it? If we were starting from scratch
a DCAP would be (at least) as good a way as any other of capturing stuff
like functional requirements and Usage Guidelines - but since these
don't actually add to the functionality of the metadata scheme you end
up with as far as I can see, where we already have this stuff in other
forms (as suggested in the Usage Guidelines) then what would be the
tangible benefits of restating for a DCAP? (I suppose the flip side of
this is - would it be much work to do so?)

Touching on the Usage Guidelines - I'd question whether the example
given of AACR2 as an existing set of usage guidelines which you could
refer to in the DCAP is completely accurate? Doesn't AACR2 hold a
mixture of things that are usage guidelines, and things that would live
in the DSP? If this is so, it may be worth being explicit about this to
avoid misunderstandings.

Further on the Usage Guidelines, one of the examples of a possible
guideline is " For works of multiple authorship, the order of authors
and how many to include (e.g. first 3, or no more than 20)". I'm not
clear why you would express 'no more than 20' here, rather than as part
of the relevant Description Template in the DSP?

In terms of the library world, a question that occurs is that if we went
down this route, would we find that we ended up with a single DCAP for
libraries? As I think about it I wonder if we would find multiple DCAPs
were required - perhaps Public Libraries would have a different DCAP to
Research Libraries. Possibly more likely different types of collections
would require different DCAPs. For example, it seems likely to me that
the Functional Requirements for a rare books collection is different to
that of the DVD collection. Further, it seems likely to me that the
requirements for the DVD collection in my local public library is
different to that of the DVD collection at my local media-arts college.

If this is the case, what are the implications of mixing DCAPs within or
across libraries? How would different DCAPs work together? What would be
the implications for sharing records? Am I looking for problems here, or
anticipating real issues? (I did read the document on Interoperability,
but not sure I understand what it is getting at yet - however, I'm not
sure it really is about this kind of interoperability?)

Finally, it looks to me like RDA would benefit immensely from being
expressed as a DSP plus usage guidelines...

Owen

Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ
 
t: +44 (0)20 7594 8829
e: [EMAIL PROTECTED]

> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Karen Coyle
> Sent: 04 November 2008 13:42
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]
> 
> Folks, two new documents have been published on the Dublin Core web
> site, and I would very much like to get any comments you have on them.
> Officially, comments must be sent to the dc-general list (details
> below), but if there is discussion on these lists, I can summarize it
> there.
> 
> The first document is one I worked on -- painfully, I must say -- that
> attempts to explain the DC concept of Application Profiles. These are
> concepts we want to apply in the DC/RDA work, and my personal question
> to you all is: DOES THIS MAKE SENSE? Can we use this in our metadata
> environment? What's missing, what doesn't work, what needs
> clarification?
> 
> The next document addresses something I blogged recently:
>http://kcoyle.blogspot.com/2008/10/semantics-of-semantic.html
> which is some confusion caused by the use of the term "semantic web."
> This document is related to the Application Profile document in that
it
> defines what we need so that different metadata sets can be
> interoperable, another very important point for those of us working in
> the library systems area. The document is from an engineering point of
> view in its details, but the general concepts are quite common
> sense-ible. Again, please let us know if there are areas that need
> clarification.
> 
> Given that this is election day, may I suggest that a printout of one
> or
> 

Re: [CODE4LIB] Code4lib mugs?

2008-11-04 Thread John Fereira

Jonathan Rochkind wrote:
Aha, funding the audio and video is a great idea. Meets Code4Lib 
needs, and also meets sponsor advertising needs, because all the 
videos and audio could go up with a "capture of this content was 
sponsored by Insert Vendor Here" link. I think Bill's idea is great.  
Someone would still need to be found to volunteer to recruit and 
supervise this hypothetical student. 
A Talis sponsored mug:  Benefits everyone that attends the conference a 
little


A Talis sponsored scholarship:  Benefits  only one person and if it's 
like some of the previous scholarship excludes some from being eligible 
to receive it.


A Talis sponsorship of audio/video support:  Not only benefits attendees 
but benefits those that can't attend the conference and can watch the 
audio/video captures after the conference.


Seems to me that #3 is a clear winner.


[CODE4LIB] extreme reference FW: [STS-L] ACRL invites Cyber Zed Shed proposals - Dec.12 deadline

2008-11-04 Thread Jodi Schneider
Extreme reference sounds interesting, Bill. And a good fit for this ACRL
session, I think. (Maybe others have prospective proposals as well...)
-Jodi

 

> One person sits at the computer, typing and searching and browsing,
and the other has more time to think, talk, ponder, 

> reach into memory, and kibitz.

 

 

From: Adam Burling [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 29, 2008 11:20 AM
To: [EMAIL PROTECTED]
Subject: [STS-L] ACRL invites Cyber Zed Shed proposals - Dec.12 deadline

 

Are you a tech savvy librarian using new technologies in innovative
ways? Adapting existing technologies to reach user needs? Here is an
opportunity to share your innovations with your colleagues, library
administrators, and others at ACRL 14th National Conference, March
12-15, 2009, in Seattle! Grab your 20 minutes of fame and educate others
- submit a proposal for a Cyber Zed Shed presentation.  

The ACRL 14th National Conference Innovations Committee is looking for
proposals that document technology-related innovations in every area of
the library. Whether you are teaching in a classroom, answering
questions from patrons; acquiring, cataloging, processing or preserving
materials; or providing other services, we're interested! We invite you
to submit your most innovative proposal to help us make Seattle the site
of a truly groundbreaking conference!  

FORMAT
Cyber Zed Shed presentations are 20 minutes in length, with fifteen
minutes to present a demonstration, and five additional minutes for
audience Q&A. Presentations should document technology-related
innovations in academic and research libraries. A computer, data
projector, screen, and microphone will be provided in the Cyber Zed Shed
theater. You will be responsible for bringing all other equipment
required for your demonstration, except as agreed to in advance.

Cyber Zed Shed presentations will be held from 9:00 a.m. - 4:00 p.m. on
Friday, March 13 and Saturday, March 14 in the Cyber Zed Shed theater,
adjacent to the exhibit floor in Seattle.

HOW TO SUBMIT A PROPOSAL
Proposals must be submitted via the online proposal form at
https://marvin.foresightint.com/surveys/Tier1Survey/ACRL/284
 . Please have the following
information ready at the time you submit your proposal:

*   Contact information 
*   How does this technology make you and/or your library more
effective, efficient, or productive? (200 word limit) 
*   Describe your innovative application of technology as it applies
to libraries. (200 word limit) 
*   What technology will you require for your presentation? Please
list all equipment, software, and connections you will need/bring for
your demonstration. 
*   Time slot preference (Friday morning, Friday afternoon, Saturday
morning, Saturday afternoon). 

DEADLINE
Proposals must be submitted by Friday, December 12, 2008 (midnight CST).

NOTIFICATIONS
Applicants will be notified via e-mail in January 2009.

Visit
http://www.acrl.org/ala/mgrps/divs/acrl/events/seattle/program/cyberzeds
hed.cfm
  for complete details. Questions should be
directed to Margot Conahan at [EMAIL PROTECTED] 
, or call 312-280-2522.

**

The Association of College and Research Libraries (ACRL) is a division
of the American Library Association (ALA), representing nearly 13,000
academic and research librarians and interested individuals. ACRL is the
only individual membership organization in North America that develops
programs, products and services to meet the unique needs of academic and
research librarians. Its initiatives enable the higher education
community to understand the role that academic libraries play in the
teaching, learning and research environments. ACRL is on the Web at
http://www.acrl.org.

 


[CODE4LIB] [Fwd: Fwd: [DC-GENERAL] DCMI News 3 November 2008]

2008-11-04 Thread Karen Coyle
Folks, two new documents have been published on the Dublin Core web 
site, and I would very much like to get any comments you have on them. 
Officially, comments must be sent to the dc-general list (details 
below), but if there is discussion on these lists, I can summarize it there.


The first document is one I worked on -- painfully, I must say -- that 
attempts to explain the DC concept of Application Profiles. These are 
concepts we want to apply in the DC/RDA work, and my personal question 
to you all is: DOES THIS MAKE SENSE? Can we use this in our metadata 
environment? What's missing, what doesn't work, what needs clarification?


The next document addresses something I blogged recently:
  http://kcoyle.blogspot.com/2008/10/semantics-of-semantic.html
which is some confusion caused by the use of the term "semantic web." 
This document is related to the Application Profile document in that it 
defines what we need so that different metadata sets can be 
interoperable, another very important point for those of us working in 
the library systems area. The document is from an engineering point of 
view in its details, but the general concepts are quite common 
sense-ible. Again, please let us know if there are areas that need 
clarification.


Given that this is election day, may I suggest that a printout of one or 
both of these documents will occupy you fully while you are in line 
waiting to perform your patriotic (and moral) duty. VOTE! READ! EVOLVE!


Thank you,
kc

_

"Guidelines for Dublin Core Application Profiles" published as a Working 
Draft


2008-11-03, The new DCMI Working Draft
< http://dublincore.org/documents/2008/11/03/profile-guidelines/ >
"Guidelines for Dublin Core Application Profiles" describes the
key components of an application profile and walks the reader
through the process of designing a profile. Addressed primarily
to a non-technical audience, the guidelines also provide a
technical appendix about modeling the metadata interoperably
for use in linked data environments. This draft will be revised
in response to feedback from readers. Interested members of
the public are invited to post comments by 1 December 2008 to the
< http://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=dc-general >
DC-GENERAL mailing list, including "[Public Comment]"
in the subject line.

_

"Interoperability Levels for Dublin Core Metadata" published as a 
Working Draft


2008-11-03, < 
http://dublincore.org/documents/2008/11/03/interoperability-levels/ >

"Interoperability Levels for Dublin Core Metadata", published
today as a DCMI Working Draft, discusses the modeling choices involved
in designing metadata applications for different types of interoperability.
At Level 1, applications use data components with shared natural-language
definitions. At Level 2, data is based on the formal-semantic model of the
W3C Resource Description Framework. At Level 3, data is structured as
Description Sets (i.e., as records). At Level 4, data content is subject to
a shared set of constraints (as described in a Description Set Profile).
Conformance tests and examples are provided for each level. The Working
Draft represents work in progress for which the authors seek feedback.
Interested members of the public are invited to post comments by 1 December
2008 to the < 
http://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=dc-architecture >

DC-ARCHITECTURE mailing list, including "[Public Comment]" in the subject
line.

Thank you!
kc


-
--  ---
Karen Coyle / Digital Library Consultant
[EMAIL PROTECTED]  http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
mo.: 510-435-8234