Re: [RDA-L] Browse and search BNB open data

2011-08-05 Thread Bernhard Eversberg

05.08.2011 00:36, Karen Coyle:


 John Attig:
 Access points are treated rather strangely in RDA. The access
 point is not itself an element, but is a construct made up of other
 elements, which contains instructions about what and when to
 include various elements in an access point.

 That actually makes sense from a data design point of view. It means
 that compound things can be built up of simple things, and that
 means that you have flexibility in what you can build. (read:
 tinker-toys, or, for the younger set, Legos)


Very important indeed, but elementary for any data technician.
Not quite so for those who have been raised on AACR+MARC. They
find it strange, as John seems to indicate. Why is that so?
Starting out from the mental image of MARC,
one may find it natural that everything that can be accessed in
a search must be recorded in some data field, and exactly in the
way it is needed for the access. This notion needs to be shattered.
It has led to such extremes that, for instance, in authority
records you have 53 variant names, each and every one of them
carrying the same dates for that person. The access points for
the variant names can, however, easily be contructed out of
a name field plus a date field - the latter always the same.

MARC derives from the requirements of card printing. There, each
heading (access point in the card catalog) had to be complete
and correctly formed as part of the record. This is no longer
true, and has never been true in data processing systems:

1. Headings can be constructed out of arbitrary elements,
   they need not be stored as monolithic strings inside the record

2. New access points can be constructed that had never been
   possible in card catalogs. All kinds of combinations and
   reformattings of field contents can be programmed, no need
   to have every access point prepared in advance and stored
   in its own field. For example, extract the publisher's name
   out of the 260 and remove certain particles from it, and then
   get the date out of the fixed fields to make a useful index
   entry (access point) like  name:date

This is easy to understand, but as a consequence, the rules, and
thus the data model, will become more abstract and more difficult
to understand. But maybe only for someone who has been brought up
on the notions of the card catalog and later those of MARC. For
someone with a background in abstract data structures, John Attig's
clarifications are no surprise at all.

One more reason, one might think, to get rid of MARC ASAP.
Not really, though. Firstly, because it is utterly unrealistic,
and second. because MARC is flexible enough to be used in
new software applications that do new tricks with the old
stuff AND are able to deal with some new data elements in
novel ways. It is not the worst of ideas to look at the
additions Germans and Austrians have thought up for their
MARC dialect. It will allow us to continue with our
scenario 2 applications as they are long since in operation,
and the further step to scenario 1, if at all necessary and
useful, would not be very difficult either.
We are not using MARC internally, and are not going to, but
our internal formats are no less complex. They are only not
rooted in the mental image of the card.

B.Eversberg



Re: [RDA-L] Browse and search BNB open data

2011-08-05 Thread James Weinheimer

On 04/08/2011 21:33, Karen Coyle wrote:
snip

But the rule is that mostly, you use the publication date of
the first manifestation of the expression.  (I can't find the rule 
for this right now, since I don't have access to a lot) The only 
example I can find right now is King Kong: 
http://lccn.loc.gov/90715189, where if you look at the related 
titles, you will see 1933, while the date of publication of this item 
is 1984. King Kong (Motion picture : 1933)


Aha! Thanks. Although... isn't this an even more arcane bit of data 
than the first date of the work? And many (including you) were 
doubtful that catalogers could supply that.

/snip

Not really, because focusing on the manifestation assumes that there has 
been something published somewhere. Most of the time this is fairly 
simple, because often, your (later) item discusses the earlier version 
and saves you a lot of time. If your item does not supply this 
information, too bad, but by following the rule of Seek and ye shall 
find, which sometimes might take quite a bit of work, by using 
Worldcat, the NUC, and all kinds of other catalogs out there, plus a bit 
of ingenuity, you can normally find a record or citation to that first 
item published. Besides, most normal catalogers do such an amount of 
research very rarely. It wouldn't surprise me that if the lack of real 
consistency in these fields reflects the cataloger's lack of time, plus 
the general feeling that few patrons understand, use, or want uniform 
titles so it is not worthwhile spending the time. (I don't necessarily 
agree, as I discuss below, but the feeling is out there)


Comparing this to hunting out a first date of something as vague as the 
work, which would have to be done much more often and would probably 
always require research, is quite a different matter.


snip
In general I am having a hard time understanding how we will treat 
these kinds of composite headings in any future data carrier. They 
seem to be somewhat idiosyncratic, in that what data gets added is up 
to the cataloger, depends on the context, and probably cannot be 
generated algorithmically. This whole part about headings (access 
points in RDA, I believe) has me rather stumped from a design point of 
view. At the same time, if all of the individual elements are 
available, and one links manifestations of a single expression, then 
some system feature may be able to display this distinction to the 
user without the use of individual cataloger-formed headings. This 
would also mean that the records can be created without being 
dependent on a particular context, which should make sharing of data 
even more accurate.

/snip

In defense of catalogers, the entire system was originally designed for 
a card/print world where everyone had no choice except to browse, and 
the method worked fairly well back then. This is shown in Princeton's 
scanned catalog for Cicero's Pro Milone 
(http://imagecat1.princeton.edu/cgi-bin/ECC/cards.pl/disk3/0892/B4159?d=fp=Cicero,+Marcus+Tullius--Individual+works--Pro+Archia+%3Eg=52977.50n=47r=1.00thisname=.0047.tiff) 
and browsing forward from there, you can see how the uniform titles 
worked, and kept things more or less in order. (At Princeton, most of 
the uniform titles were handwritten in pencil in the top right hand 
corner and unfortunately pencil came out very poorly in the scans. 
Still, I think you can make out the titles and dates.)


You will see that the language translations are mostly mixed together, 
although one includes the qualifier Greek. In spite of this, the final 
product worked fairly well though, because it was pretty easy--once you 
got to Cicero. Pro Archia to browse through the cards.


Still, I think that instead of trying to shoehorn our data, which was 
created for another time, to function more or less crudely in the new 
environment, it would be far more more progressive to reconsider how to 
use the power of the current systems we have at our disposal. Uniform 
titles are a great case in point. As we saw in the Princeton catalog, 
even when they weren't done perfectly, uniform titles worked pretty well 
in a physical environment where browsing was the only way of finding 
things, but they fell apart in a computerized/keyword environment, just 
as much of the rest of the catalog. (For those interested in more on 
this, see my posting on Autocat 
http://catalogingmatters.blogspot.com/2010_10_01_archive.html) Today 
using Worldcat, I can search for au: homer and ti: odyssey 
http://www.worldcat.org/search?q=ti%3Aodyssey+au%3Ahomer and get a very 
handy, useful list that I can do a lot with: limit to books, by 
language, by dates, by translators, novel sorting etc. Today, Zebra-type 
indexing extracts the headings and other information and makes them 
available for further refinements, so we get something that so far as I 
am concerned, is far better than how the clunky, old card/printed 
catalog ever worked. (Compare the Cicero example 

Re: [RDA-L] Browse and search BNB open data

2011-08-05 Thread Karen Coyle

Quoting Bernhard Eversberg e...@biblio.tu-bs.de:



One more reason, one might think, to get rid of MARC ASAP.
Not really, though. Firstly, because it is utterly unrealistic,
and second. because MARC is flexible enough to be used in
new software applications that do new tricks with the old
stuff AND are able to deal with some new data elements in
novel ways.


The MARC format, aka ISO 2709, may have that flexibility, but I'm not  
convinced that the way that we have used MARC lives up to that. The  
atomized data that we do have, which is found in the fixed fields and  
some of the 0XX fields, is often not filled in when it should be. The  
same is true for structured headings, like the current uniform title.  
It is easy to find records for translations that do not have a uniform  
title for the original. Music catalogers are diligent about the music  
uniform title, but considerably less diligent in filling in the 047  
which is a structured form of musical composition, or the 048 for  
number of instruments or voices. The fact is that the computable area  
of our record has been treated as secondary.


And don't anyone come back and tell me that it's because systems don't  
do anything with it. It's a chicken and egg problem: systems can't do  
anything with it unless the data has been provided consistently, and  
the data isn't provided consistently because systems don't do anything  
with it. The foundation of this problem is that catalogers are being  
asked to create two parallel sets of data: one that is visible to the  
users, and one that should satisfy machine needs. We should be doing  
everything we can with a single set of data because it is just human  
nature that doing things twice will mean that something -- especially  
the less visible thing -- doesn't get done.


kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


[RDA-L] Fwd: [RDA-L] Browse and search BNB open data

2011-08-05 Thread Gene Fieg
-- Forwarded message --
From: Gene Fieg gf...@cst.edu
Date: Fri, Aug 5, 2011 at 12:42 PM
Subject: Re: [RDA-L] Browse and search BNB open data
To: kco...@kcoyle.net


Sometimes that MARC data isn't there because of local policies as well.

As for systems not being able to use the data, the system people here
finally changed a 7XX 02 from alternate title to Contains...




On Fri, Aug 5, 2011 at 12:11 PM, Karen Coyle li...@kcoyle.net wrote:

 Quoting Bernhard Eversberg e...@biblio.tu-bs.de:


 One more reason, one might think, to get rid of MARC ASAP.
 Not really, though. Firstly, because it is utterly unrealistic,
 and second. because MARC is flexible enough to be used in
 new software applications that do new tricks with the old
 stuff AND are able to deal with some new data elements in
 novel ways.


 The MARC format, aka ISO 2709, may have that flexibility, but I'm not
 convinced that the way that we have used MARC lives up to that. The atomized
 data that we do have, which is found in the fixed fields and some of the 0XX
 fields, is often not filled in when it should be. The same is true for
 structured headings, like the current uniform title. It is easy to find
 records for translations that do not have a uniform title for the original.
 Music catalogers are diligent about the music uniform title, but
 considerably less diligent in filling in the 047 which is a structured form
 of musical composition, or the 048 for number of instruments or voices. The
 fact is that the computable area of our record has been treated as
 secondary.

 And don't anyone come back and tell me that it's because systems don't do
 anything with it. It's a chicken and egg problem: systems can't do anything
 with it unless the data has been provided consistently, and the data isn't
 provided consistently because systems don't do anything with it. The
 foundation of this problem is that catalogers are being asked to create two
 parallel sets of data: one that is visible to the users, and one that should
 satisfy machine needs. We should be doing everything we can with a single
 set of data because it is just human nature that doing things twice will
 mean that something -- especially the less visible thing -- doesn't get
 done.

 kc


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet




-- 
Gene Fieg
Cataloger/Serials Librarian
Claremont School of Theology
gf...@cst.edu



-- 
Gene Fieg
Cataloger/Serials Librarian
Claremont School of Theology
gf...@cst.edu


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Bernhard Eversberg

03.08.2011 17:42, McRee Elrod:

How anyone comparing the XML and MARC versions could prefer the XML is
beyond me.  We find it simple to crosswalk from MARC to XML for anyone
who wants it, but not back again.


The latter is what we had to do in order to construct our database.
Sure you can't get full MARC21 out of the stuff, but as BL has said,
the current version is only a beginning.
(Notwithstanding, I think you *can* find a thing or two in the
database as it is.)

The broader issue of whether or not XML will indeed have to be looked at.
XML has been around for quite a while, and it has been showered with
much enthusiasm. Not only that, but many an ambitious attempt has been
made at doing metadata in a big way in XML, by more than a few good
fellows eager to prove something.
Well, we are all set to applaud the first compelling success. Why not
take our solidly non-XML BNB database as a benchmark to surpass in a
big way with an XML implementation? Doing new tricks not otherwise doable.

But seriously, XML is certainly inadequate as a medium for data input
and editing. A software interface will have to shield the raw XML
entirely from the view of catalogers. And that's rather curious because
XML is praised for being able to use human-readable tagging. But as
not only Mac has found, how readable actually is an XML record when
compared with a MARC record? The verbal tags only make the clueless
think they understand what they read, but tag numbers, besides being
language independent, can convey much more meaning and, as we all
know, become a shorthand language that is more precise and faster
for actual communication than cumbersome verbal tags as we see them
in any attempts of XML metadata. XML may be many things, but it is
not economical, in more than one way.
This may be old-school views. Just prove me wrong. Only in practice,
not in theory.

Okay then, what now? What's going to be the medium and paradigm for
the MARC successor? This question needs an answer, and soon, if RDA
is to have a future and if this future is to begin in early 2013.

B.Eversberg


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Bernhard Eversberg

Karen Coyle wrote,

... recent Code4Lib journal:

http://journal.code4lib.org/articles/5468

 One of the difficulties of deciding what we do and do not want to keep
 in MARC, or what we want to move over to the RDA environment, is that we
 have no dictionary of everything that MARC covers. For example, what
 standard identifiers are available in MARC? They are scattered all 
over the format,...


Yours is a worthwhile endeavor, no doubt.

You may try a database which, although as good as current current, has 
been in

existence for a long time and under a somewhat old-fashioned interface.
And it covers not just MARC but several other formats as well, even 
Unimarc and

the old BNBMARC and a few more obscure ones.

You get into the alphabetical list of field and subfield names
directly like this, (add your keyword to the end of it)

http://www.biblio.tu-bs.de/db/formate/page.php?urG=KWDurA=24urS=

There's also a MARC tag index:

http://www.biblio.tu-bs.de/db/formate/page.php?urG=MRCurA=24urS=...

The alphabetical listing contains all sorts of words, even German ones,
but all the MARC terms are marked M21 plus the actual MARC tag.

May it help,
B.Eversberg


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Scharff, Mark
James Weinheimer, speculating on the effects of moving MARC data to RDF XML, 
said at one point



 Compare this [loss of subfielding in 6XX fields] to losing the subfields in 
the 1xx/7xx, where the consequences would appear to be much fewer.



I'm not expert in XML, but I would surmise that losing the ability to 
distinguish the title subelements from the name of a person in what is now a 
MARC 700 12 field (i.e. an analytical added entry) would have detrimental 
effects for retrieval of music materials.  There are times when it's desirable 
to provide title phrase or keyword access to the title subelements in such 
fields.  Simply saying that those subelements will occupy an XML title field 
(somewhat equivalent to a MARC 740 02 added entry) runs the risk, I fear, of 
losing the link between the name of the person and the title.  That link is 
currently broken in our library's public catalog in terms of search 
redirection, and the lack of the link causes all sorts of mischief.



Or I could be misunderstanding the entire thing and exposing my Luddite self.  
It's happened before.



Mark Scharff, Music Cataloger

Gaylord Music Library

Washington University in St. Louis

mscha...@wustl.edumailto:mscha...@wustl.edu




Mark Scharff, Music Cataloger
Gaylord Music Library
Washington University in St. Louis
mscha...@wustl.edumailto:mscha...@wustl.edu



Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread James Weinheimer

Karen,

Thanks for sharing the article. It is really fascinating, although 
depressing. It is obviously a huge, very difficult and tedious 
undertaking, and from your experience, it seems that it will require the 
work of many people over many years. When I think about the fixed 
fields, I remember when I was at Princeton and how I reworked the online 
MARC format from LC, which was very difficult to work with at that time. 
I started work with the variable fields, and it was a lot of work, but I 
did it. Then I started on the fixed fields, thinking that the hard part 
was over, but I remember how my arm hurt at the end (working with the 
mouse) while it did not hurt with the variable fields. I was shocked by 
how incredibly complex the fixed fields are. My own two cents: the fixed 
fields are a lot of work for little payback. They can be cut way back.


Anyway, it's too bad all of this wasn't started long ago but you have to 
play the cards you are dealt!


My real concern is that we haven't got years to do this and we need to 
create something that works now, saves money now, and can be 
demonstrated as soon as possible. The BNB mapping is interesting, even 
though so much is lost--still, it is a start, and I think it's great.


I'll continue to think about your work, which is definitely important, 
and what to do. I do have one point, which I am not sure is completely 
clear from your documents. In 
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, you 
mention that 1923 in the 240, Odyssey. English. 1923, repeats the date 
of publication. This is correct but also incorrect(! I know that kind of 
statement is awful!) and therefore, is not really repeated information.


What the date in the 240 is supposed to represent, although it is highly 
inconsistent in practice, is to break a conflict with another uniform 
title (i.e. 1xx/240 combination). They do this mostly with a publication 
date (unfortunately), and I would prefer something more meaningful, e.g. 
the name of the translator, and if necessary, edition statement, or 
something more meaningful than a publication date. But the rule is that 
mostly, you use the publication date of the first manifestation of the 
expression.  (I can't find the rule for this right now, since I don't 
have access to a lot) The only example I can find right now is King 
Kong: http://lccn.loc.gov/90715189, where if you look at the related 
titles, you will see 1933, while the date of publication of this item is 
1984. King Kong (Motion picture : 1933)


It has to be qualified somehow and I guess this is better than King Kong 
(Motion picture : Fay Wray screaming) although this would have much more 
meaning to people.


My next podcast will deal with some of these distinctions in a funny way 
(I hope!). It should come out very soon, so watch for it!

Ciao,
Jim

On 03/08/2011 19:07, Karen Coyle wrote:

Quoting James Weinheimer weinheimer.ji...@gmail.com:


While there is an undoubted loss in semantics, with the future 
evolution of MARC format, we can ask: do such losses have any 
practical consequences? Although I think many subfields (although not 
the information) could disappear without any essential loss, some 
will have important consequences to different communities.


Jim, this is much of the motivation for the work that I have been 
doing to try to identify the actual elements of MARC21 -- elements 
in the semantic sense, trying to ignore the MARC21 structure (which 
results in much repetition, etc.) A report on my study is available in 
the recent Code4Lib journal:


http://journal.code4lib.org/articles/5468

One of the difficulties of deciding what we do and do not want to keep 
in MARC, or what we want to move over to the RDA environment, is that 
we have no dictionary of everything that MARC covers. For example, 
what standard identifiers are available in MARC? They are scattered 
all over the format, so it's hard to know. What about things like 
language and date? Those appear in different fields with somewhat 
different meanings.


My assumption is that a complete inventory of MARC elements is 
essential for any move away from MARC. Unfortunately, I have gotten 
now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's 
already pretty complex!) and may not have the energy to complete the 
study on my own. However, what I have done so far at least sets down 
some possible principles to follow.


I'm doing it all on the futurelib wiki so my process is as transparent 
as I can make it:

http://futurelib.pbworks.com/w/page/29114548/MARC%20elements

kc


--
James Weinheimer  weinheimer.ji...@gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Karen Coyle

Quoting James Weinheimer weinheimer.ji...@gmail.com:


Karen,

Thanks for sharing the article. It is really fascinating, although  
depressing. It is obviously a huge, very difficult and tedious  
undertaking, and from your experience, it seems that it will require  
the work of many people over many years.


I'd like there to be more folks involved, but it doesn't take years --  
if you are willing to make some decisions that work even though they  
aren't perfect. I've got it all in a database and, while tedious, it's  
not Herculean. I was able to do the fixed fields entirely as  
extraction from the database.


When I think about the fixed fields, I remember when I was at  
Princeton and how I reworked the online MARC format from LC, which  
was very difficult to work with at that time. I started work with  
the variable fields, and it was a lot of work, but I did it. Then I  
started on the fixed fields, thinking that the hard part was over,  
but I remember how my arm hurt at the end (working with the mouse)  
while it did not hurt with the variable fields. I was shocked by how  
incredibly complex the fixed fields are. My own two cents: the fixed  
fields are a lot of work for little payback. They can be cut way back.


What I did with the fixed fields is very simple: each fixed field  
element is a data element with a list of valid values. I didn't try to  
decide if those values overlap with values in the variable fields, or  
to deduplicate between elements. I did ignore the 006 since it is used  
only to make certain 008 elements repeatable, and therefore adds no  
new information (as a field... in records it does add more, but so  
does any element that is repeatable).


What is turning out to be interesting with the 1xx-8xx fields is how  
they align with RDA (which I should have expected -- maybe it's  
different discovering it for yourself). Also interesting is where they  
differ. That part I would love to be able to discuss with folks with a  
cataloging background, but we should take it off list. I can add  
people to the futurelib wiki as editors and we can create pages that  
discuss certain issues.


Of course, this may interfere with things like having free time,  
sleeping, eating or maintaining human relationships. :-)


kc



Anyway, it's too bad all of this wasn't started long ago but you  
have to play the cards you are dealt!


My real concern is that we haven't got years to do this and we need  
to create something that works now, saves money now, and can be  
demonstrated as soon as possible. The BNB mapping is interesting,  
even though so much is lost--still, it is a start, and I think it's  
great.


I'll continue to think about your work, which is definitely  
important, and what to do. I do have one point, which I am not sure  
is completely clear from your documents. In  
http://futurelib.pbworks.com/w/page/29114548/MARC%20elements, you  
mention that 1923 in the 240, Odyssey. English. 1923, repeats the  
date of publication. This is correct but also incorrect(! I know  
that kind of statement is awful!) and therefore, is not really  
repeated information.


What the date in the 240 is supposed to represent, although it is  
highly inconsistent in practice, is to break a conflict with another  
uniform title (i.e. 1xx/240 combination). They do this mostly with a  
publication date (unfortunately), and I would prefer something more  
meaningful, e.g. the name of the translator, and if necessary,  
edition statement, or something more meaningful than a publication  
date. But the rule is that mostly, you use the publication date of  
the first manifestation of the expression.  (I can't find the rule  
for this right now, since I don't have access to a lot) The only  
example I can find right now is King Kong:  
http://lccn.loc.gov/90715189, where if you look at the related  
titles, you will see 1933, while the date of publication of this  
item is 1984. King Kong (Motion picture : 1933)


It has to be qualified somehow and I guess this is better than King  
Kong (Motion picture : Fay Wray screaming) although this would have  
much more meaning to people.


My next podcast will deal with some of these distinctions in a funny  
way (I hope!). It should come out very soon, so watch for it!

Ciao,
Jim

On 03/08/2011 19:07, Karen Coyle wrote:

Quoting James Weinheimer weinheimer.ji...@gmail.com:


While there is an undoubted loss in semantics, with the future  
evolution of MARC format, we can ask: do such losses have any  
practical consequences? Although I think many subfields (although  
not the information) could disappear without any essential loss,  
some will have important consequences to different communities.


Jim, this is much of the motivation for the work that I have been  
doing to try to identify the actual elements of MARC21 --  
elements in the semantic sense, trying to ignore the MARC21  
structure (which results in much repetition, etc.) A report on my  

Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Karen Coyle

On a different note and more details:

Quoting James Weinheimer weinheimer.ji...@gmail.com:





What the date in the 240 is supposed to represent, although it is  
highly inconsistent in practice, is to break a conflict with another  
uniform title (i.e. 1xx/240 combination). They do this mostly with a  
publication date (unfortunately),


Same as with authority-controlled names, right?

and I would prefer something more meaningful, e.g. the name of the  
translator, and if necessary, edition statement, or something more  
meaningful than a publication date.


ditto something more meaningful than date of birth.
  http://kcoyle.blogspot.com/2007/09/name-authority-control-aka-name.html

 But the rule is that mostly, you use the publication date of
the first manifestation of the expression.  (I can't find the rule  
for this right now, since I don't have access to a lot) The only  
example I can find right now is King Kong:  
http://lccn.loc.gov/90715189, where if you look at the related  
titles, you will see 1933, while the date of publication of this  
item is 1984. King Kong (Motion picture : 1933)


Aha! Thanks. Although... isn't this an even more arcane bit of data  
than the first date of the work? And many (including you) were  
doubtful that catalogers could supply that.


In general I am having a hard time understanding how we will treat  
these kinds of composite headings in any future data carrier. They  
seem to be somewhat idiosyncratic, in that what data gets added is up  
to the cataloger, depends on the context, and probably cannot be  
generated algorithmically. This whole part about headings (access  
points in RDA, I believe) has me rather stumped from a design point of  
view. At the same time, if all of the individual elements are  
available, and one links manifestations of a single expression, then  
some system feature may be able to display this distinction to the  
user without the use of individual cataloger-formed headings. This  
would also mean that the records can be created without being  
dependent on a particular context, which should make sharing of data  
even more accurate.


kc



It has to be qualified somehow and I guess this is better than King  
Kong (Motion picture : Fay Wray screaming) although this would have  
much more meaning to people.


My next podcast will deal with some of these distinctions in a funny  
way (I hope!). It should come out very soon, so watch for it!

Ciao,
Jim

On 03/08/2011 19:07, Karen Coyle wrote:

Quoting James Weinheimer weinheimer.ji...@gmail.com:


While there is an undoubted loss in semantics, with the future  
evolution of MARC format, we can ask: do such losses have any  
practical consequences? Although I think many subfields (although  
not the information) could disappear without any essential loss,  
some will have important consequences to different communities.


Jim, this is much of the motivation for the work that I have been  
doing to try to identify the actual elements of MARC21 --  
elements in the semantic sense, trying to ignore the MARC21  
structure (which results in much repetition, etc.) A report on my  
study is available in the recent Code4Lib journal:


http://journal.code4lib.org/articles/5468

One of the difficulties of deciding what we do and do not want to  
keep in MARC, or what we want to move over to the RDA environment,  
is that we have no dictionary of everything that MARC covers. For  
example, what standard identifiers are available in MARC? They are  
scattered all over the format, so it's hard to know. What about  
things like language and date? Those appear in different fields  
with somewhat different meanings.


My assumption is that a complete inventory of MARC elements is  
essential for any move away from MARC. Unfortunately, I have gotten  
now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's  
already pretty complex!) and may not have the energy to complete  
the study on my own. However, what I have done so far at least sets  
down some possible principles to follow.


I'm doing it all on the futurelib wiki so my process is as  
transparent as I can make it:

http://futurelib.pbworks.com/w/page/29114548/MARC%20elements

kc


--
James Weinheimer  weinheimer.ji...@gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules:  
http://sites.google.com/site/opencatalogingrules/






--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Jonathan Rochkind

On 8/4/2011 3:33 PM, Karen Coyle wrote:
In general I am having a hard time understanding how we will treat 
these kinds of composite headings in any future data carrier. They 
seem to be somewhat idiosyncratic, in that what data gets added is up 
to the cataloger, depends on the context, and probably cannot be 
generated algorithmically. 


I'm thinking these sorts of headings should essentially be treated as 
opaque identifiers -- they were meant to basically serve the purpose 
identifiers, thus adding more-or-less arbitrary (some date you choose 
with meaning, your choice!) characters on the end to disambiguate, same 
as you'd add a more-or-less arbitrary path component on to the end of a 
URI to make sure it's unique, but expecting the finished URI string to 
be treated basically as an opaque identifier.


So if you're stumped, I'd suggest seeing if there's a way to punt and 
treat these kind of headings as single un-subfielded opaque identifiers 
(they're not URI's, they're 'local' identifiers, but they're a kind of 
identifier. Well, 'local' in the sense of local to a particular 
authority file, particular community, or sometimes actually particular 
local system).


Of course, that may cause it's own problems, if you just combine ALL 
uniform titles subfields into one big opaque 'identifier' string, might 
be losing useful semantic information that is in some of the other 
subfields. It's tricky, our legacy data is very legacy. (I don't know 
what that means, but I'm sticking to it.)  So maybe just the subfields 
you have to punt on, don't worry about, just call em disambiguating 
suffix or disambiguating date suffix or something.  Since that's all 
they are.


Either way, I think it's probably important and useful to conceptualize 
our legacy headings as legacy semi-opaque 'identifiers'.  For instance, 
it's absolutely vital, to make use of this data with legacy systems, 
that once you've deconstructed these things down to semantic elements, 
the system is still able to reconstruct them into the exact literal 
combined string 'identifier'.  So either your encoding has to somehow 
preserve order (perhaps there is an implicit order to each element, if 
the marc fields for these 'headings' work that way, I'm not sure) -- or 
perhaps there needs to be another 'heading' data element  that will 
include the complete assembled heading string 'identifier', even though 
that is duplication of information.


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread John Attig

On 8/4/2011 3:33 PM, Karen Coyle wrote:
In general I am having a hard time understanding how we will treat 
these kinds of composite headings in any future data carrier. They 
seem to be somewhat idiosyncratic, in that what data gets added is up 
to the cataloger, depends on the context, and probably cannot be 
generated algorithmically. This whole part about headings (access 
points in RDA, I believe) has me rather stumped from a design point of 
view. At the same time, if all of the individual elements are 
available, and one links manifestations of a single expression, then 
some system feature may be able to display this distinction to the 
user without the use of individual cataloger-formed headings. This 
would also mean that the records can be created without being 
dependent on a particular context, which should make sharing of data 
even more accurate.


I'm glad that Karen brought this up again.  I missed the discussion in 
which she asked about access points in RDA; by the time I caught up, the 
discussion had moved on.


Access points are treated rather strangely in RDA.  The access point is 
not itself an element, but is a construct made up of other elements, 
which contains instructions about what and when to include various 
elements in an access point.


[Note: In this, RDA follows the FRBR model, which lacks elements for 
access points.  On the other hand, FRAD treats the access point as an 
entity in its own right, separate from the person, family, corporate 
body, work, expression, manifestation, or item that it represents.  At 
some point, RDA may decide to adopt this FRAD structure (assuming that 
it survives the reconciliation of the FR models).]


In our discussions of the question of how to treat access points, the 
JSC was advised that there were certain structural complexities that we 
should not attempt to build into the RDA element set, but should rely on 
the encoding to bring together the various elements into the access 
point construct.  In MARC, we are accustomed to using subfields to 
encode the specific data elements and fields to wrap them up into an 
ordered construct.  Similarly, in XML, one would expect to use some sort 
of wrapper to enclose all the elements that make up the access point.  
In order to do this, I suspect that one needs to treat the access point 
construct as if it were an element, even if the RDA element set does not 
treat it as such.


Beyond these technical issues, this discussion raises questions about 
the way in which access points are constructed and used.


a) The instructions on what to include in an access point represents our 
collective experience of what is important for uniquely identifying a 
given entity.  There seems to be some value in gathering all these 
elements together for indexing and display as an aggregation of 
identifying information.


b) While it is true that the individual elements are sufficient for 
finding relevant resources and don't need to be aggregated in a 
precoordinated way in order to work, I would argue that finding, 
identifying, and selecting relevant resources is sometimes best 
supported by browsing an alphabetical list of access points that are 
constructed in a way that reveals the structure of the things being 
browsed.  Examples might be an alphabetical display of hierarchical 
entities such as corporate bodies, or an organized sequence of headings 
representing works and expressions.  We may not NEED access points, but 
they can sure be helpful on occasion.


c) In order to work, some thought needs to be given to the structure of 
the data, so that the sequence of access points reveals that structure.  
Traditionally we have done this by hand-crafting precoordinated access 
points according to instructions that aim to provide the best result 
that can be anticipated and applied globally.  This may not be the best 
way of doing things.


d)  While many of us are skeptical of the ability of algorithms to 
create such structured access points automatically, it is certainly 
worth the attempt.  If there could be a clear set of objectives for the 
exercise, algorithms might in fact be possible, bringing together 
relevant elements and arranging them in a significant order to form the 
access points.  Even better, it might be possible to (i) offer different 
options for sequencing the elements -- sorting first by language or by 
format, for example -- and/or (ii) work in real time to formulate the 
best way of sequencing a given result set.  Catalogers tend to resist 
giving up their hand-crafted headings, but that tends to be because they 
are not offered attractive alternatives.  What I suggested above seems 
to be such an attractive alternative.


John Attig
Authority Control Librarian
Penn State University
jx...@psu.edu


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Brenndorfer, Thomas
 -Original Message-
 From: Resource Description and Access / Resource Description and Access
 [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of John Attig
 Sent: August 4, 2011 4:09 PM
 To: RDA-L@LISTSERV.LAC-BAC.GC.CA
 Subject: Re: [RDA-L] Browse and search BNB open data


...

 Access points are treated rather strangely in RDA.  The access point is
 not itself an element, but is a construct made up of other elements,
 which contains instructions about what and when to include various
 elements in an access point.

 [Note: In this, RDA follows the FRBR model, which lacks elements for
 access points.  On the other hand, FRAD treats the access point as an
 entity in its own right, separate from the person, family, corporate
 body, work, expression, manifestation, or item that it represents.  At
 some point, RDA may decide to adopt this FRAD structure (assuming that
 it survives the reconciliation of the FR models).]


There are a few areas where the distinction between the access point as an 
element and as an entity can be confusing.

The equivalent of the undifferentiated name indicator in FRAD is an attribute 
of the entity controlled access point.

In RDA 8.11, it's an attribute for a Person entity-- it's used when the core 
elements for a Person are not sufficient for differentiation. [I'm aware of the 
error in RDA that is being corrected-- the placement of this element in 8.11 
suggests it applies also to corporate bodies and families when it does not].


However, in RDA 9.19.1.1, there is an instruction to use the undifferentiated 
name indicator when the access point cannot make use of any suitable addition 
to differentiate persons with the same name.


Those instructions suggest there's a relationship between the core elements and 
the elements that go into forming an authorized access point. The relationship 
between the two processes (recording core elements and constructing access 
points) is not spelled out sufficiently to leave the impression that there 
really is not a conflict between the two instructions.


There are some other areas where the connections between the elements and the 
access point suggest implications that are not readily apparent from the basic 
instructions.

For example, when settling upon a preferred name or preferred title, RDA 
consistently instructs to do so in light of their use as the basis for the 
authorized access point (example at RDA 9.2.2.1). That suggests that in 
environments that don't use authorized access points, decisions still need to 
be made that support the ongoing existence of authorized access points.


In the RDA Element Set View (under the Tools tab in the RDA Toolkit), one FRAD 
entity is listed, and that is Name. It has attributes such as Date of usage 
and Scope of usage-- attributes that don't really make sense applied to the 
Person entity. Rather, they belong to the Name entity (and in FRAD, the Name 
entity is also separate from the Person entity and the Access point entity-- 
there are relationships between these entities that are spelled out in FRAD).


While the main text of RDA subsumes entities like access point and name 
into the instructions for the main entity, such as Person, there are points at 
which it seems that the FRAD approach might be useful.

As an example, access points in FRAD have relationships to Rules and 
Agencies, as well as a set of attributes such as language and script. 
These additional bits of information would make sense clustered together as 
attributes and relationships around the respective entities.

Thomas Brenndorfer
Guelph Public Library


Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Karen Coyle

Quoting John Attig jx...@psu.edu:

John, thank you so much -- this is very helpful. Wonderful, even.



Access points are treated rather strangely in RDA.  The access point  
is not itself an element, but is a construct made up of other  
elements, which contains instructions about what and when to include  
various elements in an access point.


That actually makes sense from a data design point of view. It means  
that compound things can be built up of simple things, and that  
means that you have flexibility in what you can build. (read:  
tinker-toys, or, for the younger set, Legos)





In our discussions of the question of how to treat access points,  
the JSC was advised that there were certain structural complexities  
that we should not attempt to build into the RDA element set, but  
should rely on the encoding to bring together the various elements  
into the access point construct.


Here I want to point out that there can be a useful difference between  
your data elements, data model and your instance data. Your data  
elements can be atomistic, your data model can allow building of  
various molecules from the atoms, and your instance data can make  
use of the whole in many different ways.


In MARC, we are accustomed to using subfields to encode the specific  
data elements and fields to wrap them up into an ordered construct.   
Similarly, in XML, one would expect to use some sort of wrapper to  
enclose all the elements that make up the access point.  In order to  
do this, I suspect that one needs to treat the access point  
construct as if it were an element, even if the RDA element set does  
not treat it as such.


I could also imagine that happening in an application layer. Without  
any change in the underlying data there could be different  
interpretations -- I usually call them views -- of the data. [Your  
last para states this very well; see below] But the key thing is that  
by not having the individual elements bound into the RDA complex  
elements you have freed the sub-elements from that structure, and  
they can be used in various ways if desired. In a situation where the  
only way to express date of expression is in an access point, you  
have restricted that data element (which may be of interest for other  
reasons) to that one situation.


The more I look at RDA as elements the more I admire the separation of  
data content from record structure. This gives us many more  
possibilities for system developers.




Beyond these technical issues, this discussion raises questions  
about the way in which access points are constructed and used.


a) The instructions on what to include in an access point represents  
our collective experience of what is important for uniquely  
identifying a given entity.  There seems to be some value in  
gathering all these elements together for indexing and display as an  
aggregation of identifying information.


Yes, and that should be possible.



b) While it is true that the individual elements are sufficient for  
finding relevant resources and don't need to be aggregated in a  
precoordinated way in order to work, I would argue that finding,  
identifying, and selecting relevant resources is sometimes best  
supported by browsing an alphabetical list of access points that are  
constructed in a way that reveals the structure of the things being  
browsed.  Examples might be an alphabetical display of hierarchical  
entities such as corporate bodies, or an organized sequence of  
headings representing works and expressions.  We may not NEED access  
points, but they can sure be helpful on occasion.


I think this becomes a system efficiency question rather than a  
meaning question. At what point do systems need to manage these  
strings for the most efficient use? Is it easier to create them  
automatically in case they are needed, storing the data in multiple  
places? Or will there be a reason to bring this data together on the  
fly?


I don't think we need to answer that at this point, but I would like  
to suggest that it would be ideal to begin to develop use cases. Use  
cases state a situation (user is looking for xyz), and what you would  
like the outcome to be (user gets/sees/is asked for...). There amy be  
more than one way to do this.  You have included a use case here in  
suggesting an alphabetical display. There are undoubtedly search and  
find use cases related to this information (user wants bookX but only  
in Spanish), etc. Since a system should attempt to satisfy a variety  
of use cases, this would help me (and maybe other systems  
developers/thinkers) to understand the range of services we want to  
get out of this data.


In essence, the data as input should not be considered to be the same  
as the strings that the user will see. It was so, often, in MARC, but  
that was kind of a throw-back to the card days. Today one designs user  
interfaces and services BEFORE defining the data structure. The 

Re: [RDA-L] Browse and search BNB open data

2011-08-04 Thread Mark Ehlert
Brenndorfer, Thomas tbrenndor...@library.guelph.on.ca wrote:
 In RDA 8.11, it's an attribute for a Person entity-- it's used when the core 
 elements for a Person are not sufficient for differentiation. [I'm aware of 
 the error in RDA that is being corrected-- the placement of this element in 
 8.11 suggests it applies also to corporate bodies and families when it does 
 not].

Tangent: the last line of 8.6 should be cleared up in the same manner.
 And I think the last line under 10.10.1.1 will get the boot as well.
On the other hand, I wouldn't mind future-proofing corporate and,
especially, family names by allowing undifferentiated status markers
for these.

-- 
Mark K. Ehlert                 Minitex
Coordinator                    University of Minnesota
Bibliographic  Technical      15 Andersen Library
  Services (BATS) Unit        222 21st Avenue South
Phone: 612-624-0805            Minneapolis, MN 55455-0439
http://www.minitex.umn.edu/


Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread Bernhard Eversberg

02.08.2011 18:34, J. McRee Elrod:

   http://www.allegro-c.de/db/a30/bl.htm

Am I correct that there is no MARC display available?


OK, for what it's worth and for good measure, I've added that in;
no big deal since we've got what it takes.
Now, MARC appears directly underneath the regular display. But only
as complete and as correct as the stuff that was released.
The format made available by BL is an XML schema of their
own design, documented here:

  http://www.bl.uk/bibliographic/datafree.html
  (under Data model  draft schema)

A sample XML record:

rdf:Description
dcterms:titleThe elves and the emperor/dcterms:title
dcterms:creator
rdf:Description
rdfs:labelRobinson, Hilary, 1962-/rdfs:label
/rdf:Description
/dcterms:creator
dcterms:contributor
rdf:Description
rdfs:labelSanfilippo, Simona./rdfs:label
/rdf:Description
/dcterms:contributor
dcterms:type
rdf:Description
rdfs:labeltext/rdfs:label
/rdf:Description
/dcterms:type
dcterms:type
rdf:Description
rdfs:labelmonographic/rdfs:label
/rdf:Description
/dcterms:type
isbd:P1016
rdf:Description
rdfs:labelLondon/rdfs:label
/rdf:Description
/isbd:P1016
dcterms:publisher
rdf:Description
rdfs:labelWayland/rdfs:label
/rdf:Description
/dcterms:publisher
dcterms:issued2009/dcterms:issued
dcterms:language
rdf:Description
rdf:value rdf:datatype=http://purl.org/dc/terms/ISO639-2;eng/rdf:value
/rdf:Description
/dcterms:language
dcterms:extent
rdf:Description
rdfs:label31 p/rdfs:label
/rdf:Description
/dcterms:extent
dcterms:descriptionOriginally published: 2008./dcterms:description
dcterms:subject
skos:Concept
skos:notation rdf:datatype=ddc:Notation428.6/skos:notation
skos:inScheme rdf:resource=http://dewey.info/scheme/e22; /
/skos:Concept
/dcterms:subject
dcterms:isPartOf
rdf:Description
rdfs:labelFairytale jumbles/rdfs:label
/rdf:Description
/dcterms:isPartOf
dcterms:isPartOf
rdf:Description
rdfs:labelStart reading. Purple band 8/rdfs:label
/rdf:Description
/dcterms:isPartOf
dcterms:identifier(Uk)015346892/dcterms:identifier
dcterms:identifierGBA979108/dcterms:identifier
bibo:isbn9780750255233/bibo:isbn
bibo:isbn0750255234/bibo:isbn
dcterms:identifierURN:ISBN:9780750255233/dcterms:identifier
dcterms:identifierURN:ISBN:0750255234/dcterms:identifier
/rdf:Description

which translates like this:

=LDR  01234cam a22002771i 45e0
=001  015346892
=007  ta
=008  \\991231s2009n\\\eng\d
=020  \\$a9780750255233
=040   $ea
=082  00$a428.6
=100  1\$aRobinson, Hilary (1962-)
=245  04$aThe elves and the emperor /$cHilary Robinson
=260  \\$aLondon :$bWayland,$c2009
=300  \\$a31 p
=440  \0$aFairytale jumbles
=440  \0$aStart reading. Purple band 8
=500  \\$aOriginally published: 2008.
=700  12$aSanfilippo, Simona


B.E.


Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread James Weinheimer

On 03/08/2011 08:34, Bernhard Eversberg wrote:
snip

02.08.2011 18:34, J. McRee Elrod:

   http://www.allegro-c.de/db/a30/bl.htm

Am I correct that there is no MARC display available?


OK, for what it's worth and for good measure, I've added that in;
no big deal since we've got what it takes.
Now, MARC appears directly underneath the regular display. But only
as complete and as correct as the stuff that was released.
The format made available by BL is an XML schema of their
own design, documented here:

  http://www.bl.uk/bibliographic/datafree.html
  (under Data model  draft schema)

/snip

This is interesting. From the table 
http://www.bl.uk/bibliographic/pdfs/marctordfxmlmappingsv0-3-2.pdf, we 
see how some of the semantics of the MARC format are lost in the 
conversion. As we evolve away from the MARC format, I am sure the 
direction will be toward simplification, so it seems valuable to discuss 
what could be eliminated from MARC with the fewest consequences. From a 
very quick review of that table, I see the 534 being translated to 
dcterms:description, losing some handy subfields, and all of the 
subfields in the 100/700 fields mapping to dcterms:creator. Also, all of 
the subfields in the 6xx fields are being placed into dcterms:subject, 
and there is a loss of the subfield description avxyz.


I need to emphasize that this is discussing losing the specific subfield 
*coding*, NOT losing the information, e.g.

100 0_*|a *Benedict*|b *XVI,*|c *Pope,*|d *1927-
as opposed to
*dcterms:creator*Benedict**XVI,**Pope,**1927-*/dcterms:creator*

In practical terms for all the various metadata communities, where 
precisely is the loss here?


While there is an undoubted loss in semantics, with the future evolution 
of MARC format, we can ask: do such losses have any practical 
consequences? Although I think many subfields (although not the 
information) could disappear without any essential loss, some will have 
important consequences to different communities. For instance, we see in 
the mapping the complete elimination of 245$c, which would obviously 
have important consequences for *librarians* (i.e. necessary for 
determination of a copy), although the loss of 245$c would be much less 
dire for the users. Loss of subfields with some of the most consequences 
would seem to be the subfields in the 6xx fields, since those semantics 
*could* lead to novel computer manipulation, sorting by chronology, 
geographic, and all kinds of other ways. Also, the distinctions of:

650$aHistory$xBibliography
650$aHistory$vBibliography
650$aBibliography$xHistory

would be lost.

Compare this to losing the subfields in the 1xx/7xx, where the 
consequences would appear to be much fewer.


Yet, compare this to what others want: even more semantics, for example, 
to encode 300$a even further to specify pages or leaves or whatever. e.g.

300
a
pages
245
/pages
leaves
56
/leaves
/a
/300
etc.

There are definite advantages with this level of coding but on the 
negative side, it is more work, prone to many more errors, and is more 
difficult to train new people, especially as there will be the push to 
simplify.


I think these questions will begin to be asked (finally!), and answered 
too. This project from the British Library may be a great catalyst for 
the discussion.


--
James Weinheimer  weinheimer.ji...@gmail.com
First Thus: http://catalogingmatters.blogspot.com/
Cooperative Cataloging Rules: http://sites.google.com/site/opencatalogingrules/



Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread Bernhard Eversberg

Am 03.08.2011 10:55, schrieb James Weinheimer:


There are definite advantages with this level of coding but on the 
negative side, it is more work, prone to many more errors, and is more 
difficult to train new people, especially as there will be the push to 
simplify.


I think these questions will begin to be asked (finally!), and 
answered too. This project from the British Library may be a great 
catalyst for the discussion.

The BL has teamed up with Talis to develop and improve their
open data activities. Here's more about that, together with a nice
diagram any cataloger might love to mount on their office wall:

  http://consulting.talis.com/2011/07/british-library-data-model-overview/

I understand that the current release is only a first step, and
together with Talis they will produce an improved version in the
near future.

B.Eversberg



Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread metadata
Bernard and all, 

In order to clarify the current situation, The British Library would like to 
take this opportunity to outline the range of free/open BNB options and 
encourage anyone seeking details to check 
http://www.bl.uk/bibliographic/datafree.html for further information. We would 
like to emphasise the experimental nature of this work and the likelihood that 
datasets we make available will be subject to change over time. As a result, we 
would recommend that those wishing to use the most up to date version of the 
BNB dataset obtain it directly from the BL. Older versions available from other 
sites have now been superseded and we will be contacting organisations we 
identify mounting these to offer updated versions.

The current BNB options are: 

1) BNB as linked data (the latest free data release, in association with Talis) 
- Available under a CC0 license using: SPARQL, Describe and Search endpoints. 
This dataset has been updated from an initial preview version of around 400,000 
records to cover over 2.6 million monographs (80,249,538 triples) ; we hope to 
also offer a dump of the file via FTP shortly using the new data model 
(available at http://www.bl.uk/bibliographic/pdfs/datamodelv1_01.pdf) and 
schema (available at: 
(http://www.bl.uk/bibliographic/pdfs/britishlibrarytermsv1-01.pdf) 

2) BNB in basic RDF/XML via FTP (the dataset currently under discussion) - 
Available under a CC0 license to individual researchers or organisations not 
requiring MARC21 data but wishing to data mine, mash up or otherwise 
interrogate the data set in bulk. An updated version is currently being 
produced which will be available via FTP directly from the BL - please contact 
metad...@bl.uk for access details. 

3) BNB Z39.50 MARC21 Access - A free registration based service for 
non-commercial use under terms outlined on the British Library free data web 
page at: http://www.bl.uk/bibliographic/datafree.html 

If you have any queries about any of the BNB data offerings, please contact us 
at metad...@bl.uk 

Thank you

Best regards

Corine

Corine Deliot on behalf of Metadata Services, The British Library. 
email: metad...@bl.uk 


From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Bernhard Eversberg
Sent: 03 August 2011 10:14
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] Browse and search BNB open data

Am 03.08.2011 10:55, schrieb James Weinheimer: 

There are definite advantages with this level of coding but on the negative 
side, it is more work, prone to many more errors, and is more difficult to 
train new people, especially as there will be the push to simplify. 

I think these questions will begin to be asked (finally!), and answered too. 
This project from the British Library may be a great catalyst for the 
discussion.
The BL has teamed up with Talis to develop and improve their
open data activities. Here's more about that, together with a nice
diagram any cataloger might love to mount on their office wall:

  http://consulting.talis.com/2011/07/british-library-data-model-overview/

I understand that the current release is only a first step, and 
together with Talis they will produce an improved version in the
near future.

B.Eversberg


Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread J. McRee Elrod
In article 4e38ebe3.5090...@biblio.tu-bs.de, you wrote:

OK, for what it's worth and for good measure, I've added that in;
no big deal since we've got what it takes.

Bless your sweet heart.

Did you notice the not for commercial purposes in the BL posting? We
are not even going to ask.  No matter how much we give back, as
outsourcer we are made to feel dirty.

How anyone comparing the XML and MARC versions could prefer the XML is
beyond me.  We find it simple to crosswalk from MARC to XML for anyone
who wants it, but not back again.

Mac


   __   __   J. McRee (Mac) Elrod (m...@slc.bc.ca)
  {__  |   / Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
  ___} |__ \__


Re: [RDA-L] Browse and search BNB open data

2011-08-03 Thread Karen Coyle

Quoting James Weinheimer weinheimer.ji...@gmail.com:


While there is an undoubted loss in semantics, with the future  
evolution of MARC format, we can ask: do such losses have any  
practical consequences? Although I think many subfields (although  
not the information) could disappear without any essential loss,  
some will have important consequences to different communities.


Jim, this is much of the motivation for the work that I have been  
doing to try to identify the actual elements of MARC21 -- elements  
in the semantic sense, trying to ignore the MARC21 structure (which  
results in much repetition, etc.) A report on my study is available in  
the recent Code4Lib journal:


http://journal.code4lib.org/articles/5468

One of the difficulties of deciding what we do and do not want to keep  
in MARC, or what we want to move over to the RDA environment, is that  
we have no dictionary of everything that MARC covers. For example,  
what standard identifiers are available in MARC? They are scattered  
all over the format, so it's hard to know. What about things like  
language and date? Those appear in different fields with somewhat  
different meanings.


My assumption is that a complete inventory of MARC elements is  
essential for any move away from MARC. Unfortunately, I have gotten  
now to the 1xx-8xx fields (the study so far is 00x and 0xx, that's  
already pretty complex!) and may not have the energy to complete the  
study on my own. However, what I have done so far at least sets down  
some possible principles to follow.


I'm doing it all on the futurelib wiki so my process is as transparent  
as I can make it:

  http://futurelib.pbworks.com/w/page/29114548/MARC%20elements

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] Browse and search BNB open data

2011-08-02 Thread J. McRee Elrod
Gernhard Eversberg posted to RDA-L:

   http://www.allegro-c.de/db/a30/bl.htm

Thank you.  Your skill in making resources available is remarkable.

Am I correct that there is no MARC display available?

I'm copying to Autocat, so that the resource will be more widely known.


   __   __   J. McRee (Mac) Elrod (m...@slc.bc.ca)
  {__  |   / Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
  ___} |__ \__